计算机科学 ›› 2020, Vol. 47 ›› Issue (6): 51-58.doi: 10.11896/jsjkx.190300140

• 智能软件工程 • 上一篇    下一篇

面向群体协作开发的开源软件峭壁分析

何鹏1,2, 喻绿君1   

  1. 1 湖北大学计算机与信息工程学院 武汉430062
    2 应用数学湖北省重点实验室 武汉430062
  • 收稿日期:2019-03-27 出版日期:2020-06-15 发布日期:2020-06-10
  • 通讯作者: 何鹏(penghe@hubu.edu.cn)
  • 基金资助:
    国家重点研发计划(2018YFB1003801);国家自然科学基金项目(61902114);湖北省教育厅青年人才项目(Q20171008);应用数学湖北省重点实验室开放基金(HBAM201901)

Analysis of Open Source Software Cliff Walls for Group Collaborative Development

HE Peng1,2, YU Lv-jun1   

  1. 1 School of Computer Science and Information Engineering,Hubei University,Wuhan 430062,China
    2 Hubei Key Laboratory of Applied Mathematics,Wuhan 430062,China
  • Received:2019-03-27 Online:2020-06-15 Published:2020-06-10
  • About author:HE Peng,born in 1988,Ph.D,associate professor,master supervisor,is a member of China Computer Federation.His main research interests include software measure,software defect prediction,and service recommendationand complex network.
  • Supported by:
    This work was supported by the National Key R & D Program of China (2018YFB1003801),National Natural Science Foundation of China(61902114),Hubei Province Education Department Youth Talent Project(Q20171008) and Hubei Provincial Key Laboratory of Applied Mathematics(HBAM201901)

摘要: 开源软件项目因门槛低、自由度高,在开发过程中存在进度缓慢、效率低下和项目质量偏低等问题;同时,软件峭壁(Software Cliff Wall)作为一种判定项目鲁棒性的依据,表现为软件开发过程中在短时间内完成远超过常规增量开发的一种代码贡献行为,是软件演化过程中可持续发展的一种潜在威胁。为了深入研究开源项目的开发过程,更准确地刻画软件演化,从而提高软件开发效率,分析软件峭壁的成因是一种行之有效的方法。实验以GitHub上9个时间跨度至少有5年的开源软件项目为研究对象,分别以月份和季度为周期,基于150 000多个commits上开发者的关注与评论信息构建开发者合作网络(Deve-loper Collaboration Networks,DCN),将代码行数超过1万行的单次commit视为软件峭壁,并从网络规模、网络结构、网络质量3个方面,利用节点数、连边数、节点更新率、模块度、平均路径长度、平均度、节点入度指数、节点出度均值、多样性这9个度量指标来分析软件开发过程中DCN与软件峭壁的关系。研究结果表明:1)当开发团队规模偏小,且成员更新幅度较大时,容易形成软件峭壁;2)保持开发者之间良好的“小世界”特性,有助于避免峭壁的产生;3)以季度为周期来分析软件开发过程中DCN与软件峭壁的关系更为合适,且开发团队成员的组织来源多样化也会在一定程度上促进软件峭壁的产生。

关键词: 群体协作开发, 开发者合作网络, 软件峭壁, 软件演化, 开源软件

Abstract: Due to the characteristics of low threshold and high freedom,open source software encounters slow progress,low efficiency and low quality in the development process.Software cliff wall as a criterion of project robustness,indicates unexpected acceleration in common incremental development activities over short periods of time,which is a potential threat to the sustainable development in software evolution.Therefore,analyzing the causes of software cliff walls is an effective method to deeply understand the development process of open source projects,to more accurately describe the evolution of software,and to improve the efficiency of software development.The experiment firstly constructes a series of developer collaboration networks (DCNs) over more than 150 thousand commits from 9 GitHub projects by month and quarter respectively.This paper consideres a single commit of more than 10 000 lines of code as a software cliffs.And then it introduces 9 metrics,such as the number of nodes,the number of connected edges,the node update rate,the module degree,the average path length,the average degree,the node penetration index,the node out-of-mean,and the diversity,to analyze the relationship between DCN and cliff walls from the perspectives of network scale,network structure and network quality.The results show that:1)smaller development teams and greater member turnover tend to cause a cliff wall;2)‘small world’ features among developers is helpful to avoid the emergence of software cliff walls;3)the relationship between DCN and software cliffs in the software development process is more appropriate in a quarterly cycle,and the diversity of the development team will also affect the creation of cliff walls in software development.

Key words: Group collective development, Developer collaboration network, Software cliff walls, Software evolution, Open source software

中图分类号: 

  • TP301
[1]BROWN A W,BOOCH G.Reusing Open-Source Software and Practices:The Impact of Open-Source on Commercial Vendors[C]//International Conference on Software Reuse.2002:123-136.
[2]YANG B,YU Q,ZHANG W,et al.Influence Factors Correlation Analysis in Github Open Source Software Development Process[J].Journal of Software,2017,28(6):1330-1342.
[3]HE P,LI B,YANG X H,et al.Research On Developer Preferential Collaboration in Open-Source Software Community[J].Computer Science,2015,42(2):161-166.
[4]ZHOU M H.Looking for micro-process in large-scale data[C]//Proceedings of the 2nd International Workshop on Evidential Assessment of Software Technologies.New York:ACM,2012:39-42.
[5]KALLIAMVAKOU E,GOUSIOS G,BLINCOE K,et al.The Promises and Perils of Mining Github[C]//Proceedings of the 11th Working Conference on Mining Software Repositories.New York:ACM,2014:92-101.
[6]LI W P,WANG J B,LIN Z Q,et al.Software Knowledge Graph Building Method for Open Source Project[J].Journal of Frontiers of Computer Science & Technology,2017,11(6):851-862.
[7]JUNG H W,KIM S G,CHUNG C S.Measuring Software Product Quality:A Survey of ISO/IEC 9126[J].IEEE Software,2004,21(5):88-92.
[8]GIRBEA A,SUCIU C,NECHIFOR S,et al.Design and Implementation of a Service-Oriented Architecture for the Optimization of Industrial Applications[J].IEEE Transactions on Industrial Informatics,2014,10(1):185-196.
[9]MENEELY A,WILLIAMS L,SNIPES W,et al.Predicting Failures with Developer Networks and Social Network Analysis[C]//Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering.New York:ACM,2008:13-23.
[10]LIMAM N,BOUTABA R.Assessing Software Service Quality and Trustworthiness at Selection Time[J].IEEE Transactions on Software Engineering,2010,36(4):559-574.
[11]MACLEAN A C,PRATT L J,KREIN J L,et al.Trends that Affect Temporal Analysis using Sourceforge data[C]//Proceedings of the 5th International Workshop on Public Data about Software Development.North Carolina,USA,2010:6-11.
[12]MACLEAN A C.Commit Patterns and Threats to Validity in Analysis of Open Source Software Repositories[D].Utah:Brigham Young University,2012.
[13]PRATT L J.Cliff Walls:Threats to Validity in Empirical Studies of Open Source Forges[D].Utah:Brigham Young University,2013.
[14]CHENG C,LI B,LI Z Y,et al.Developer Role Evolution in Open Source Software Ecosystem:An Explanatory Study on GNOME[J].Journal of Computer Science and Technology,2017,32(2):396-414.
[15]PRATT L J,MACLEAN A C,KNUTSON C D,et al.Cliff Walls:An Analysis of Monolithic Commits Using Latent Dirichlet Allocation[C]//IFIP International Conference on Open Source Systems.Springer,2011:282-298.
[16]CHEN D,WANG X,HE P,et al.Towards Understanding Existing Developers’ Collaborative Behavior In OSS Communities[J].ComputerScience,2016,43(6A):476-479.
[17]GRÖNLUND M,JEFFORD-BAKER J.Measuring correlation between commit frequency and popularity on GitHub[D].Stockholm:KTH Royal Institute of Technology,2017.
[18]SINHA V S,MANI S,SINHA S.Entering the circle of trust:developer initiation as committers in open-source projects[C]//Proceedings of the 8th Working Conference on Mining Software Repositories.2011:133-142.
[19]MA Y T,WU Y,XU Y W.Dynamics of open-source software developer's commit behavior[C]//Proceedings of the 29th Annual ACM Symposium on Applied Computing(SAC’14).New York,USA:ACM Press,2014:1171-1173.
[20]GOUSIOS G.The GHTorent dataset and tool suite[C]//Proceedings of the 10th Working Conference on Mining Software Repositories.2013:233-236.
[21]Struggling in IT.GitHub 2018 Annual Report[EB/OL].http://www.wh-ford.com/f828820/20181030A1WJZ800.html.
[22]HINDLE A,GERMAN D M,HOLT R C,et al.Automatic Classification of Large Changes into Maintenance Categories[C]//IEEE International Conference on Program Comprehension.IEEE,2009:99-108.
[23]ARAFAT O,RIEHLE D.The Commit Size Distribution of Open Source Software[C]//Hawaii International Conference on System Science.IEEE,2009:1-8.
[24]GU Q,CHEN D X.Validation and Simulation of Software System Evolution Rules Using Software Networks[J].Scientia Sinica Informationis,2014,44(1):20-36.
[25]GU Q,XIONG S J,CHEN D X.Correlations between characteristics of maximum influence and degree distributions in software networks[J].SCIENCE CHINA Information Sciences,2014,57(7):1-12.
[26]HE P,WANG P,LI B,et al.An Evolution Analysis of Software System Based On Multi-Granularity Software Network[J].Acta Electronica Sinica,2018,46(2):257-267.
[27]PAN W F,LI B,MA Y T,et al.Multi-Granularity Evolution Analysis of Software Using Complex Network Theory[J].Journal of Systems Science and Complexity,2011,24(6):1068-1082.
[28]NEWMAN M E J.Fast Algorithm for Detecting Community Structure in Networks[J].Physical Review E,2003,69(6):066133.
[1] 张静宣, 江贺. 代码标识符归一化研究现状及发展趋势[J]. 计算机科学, 2020, 47(3): 1-4.
[2] 张超,毛新军,卢遥. 基于特征提取的开源社区Fork摘要自动生成方法[J]. 计算机科学, 2020, 47(3): 25-33.
[3] 卢冬冬, 吴洁, 刘鹏, 盛永祥. 开源软件关键开发者类型及协作网络鲁棒性分析[J]. 计算机科学, 2020, 47(12): 100-105.
[4] 钟林辉, 扶丽娟, 叶海涛, 齐杰, 徐静. 软件演化历史的逆向工程生成方法研究[J]. 计算机科学, 2020, 47(11A): 549-556.
[5] 王扩, 王忠杰. 众包协作流程的恢复方法[J]. 计算机科学, 2020, 47(10): 19-25.
[6] 潘浩, 郑巍, 张紫枫, 芦超群. 软件网络分形结构特征研究[J]. 计算机科学, 2019, 46(2): 166-170.
[7] 唐倩文, 陈良育. 基于复杂网络理论的Java开源系统演化分析[J]. 计算机科学, 2018, 45(8): 166-173.
[8] 郑交交, 李彤, 林英, 谢仲文, 王晓芳, 成蕾, 刘妙. 构件系统演化一致性的判定方法[J]. 计算机科学, 2018, 45(10): 189-195.
[9] 赵会群,黄榆涵. 软件模型代数性质的程序化验证[J]. 计算机科学, 2017, 44(11): 240-245.
[10] 陈丹,王星,何鹏,曾诚. 开源社区中已有开发者的合作行为分析[J]. 计算机科学, 2016, 43(Z6): 476-479.
[11] 钟林辉,李俊杰,夏鲸,薛良波. 基于多维属性的构件化软件演化相似性度量方法研究[J]. 计算机科学, 2016, 43(Z11): 499-505.
[12] 钱晔,李彤,郁涌,孙吉红,于倩,彭琳. 一种面向同步交互的软件演化过程建模方法[J]. 计算机科学, 2016, 43(8): 154-158.
[13] 韩俊明,王炜. 基于LDA的软件演化确认建模[J]. 计算机科学, 2015, 42(Z11): 464-466.
[14] 刘阳,刘秋荣,刘辉. 函数抽取重构的自动检测方法[J]. 计算机科学, 2015, 42(12): 105-107.
[15] 李其锋,李 兵. 开源软件开发者的演化研究[J]. 计算机科学, 2015, 42(12): 43-46.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .