计算机科学 ›› 2020, Vol. 47 ›› Issue (3): 25-33.doi: 10.11896/jsjkx.191000087
所属专题: 智能软件工程
张超1,毛新军1,2,卢遥1
ZHANG Chao1,MAO Xin-jun1,2,LU Yao1
摘要: 当前,基于P/R的分布式协同开发已经成为开源社区中的主导软件开发方式。开发者通过Fork复制软件项目的版本库,创建自身分支,并在新建分支中进行独立开发。由于P/R协同开发模型具有开放性、透明性和并行化等特征,开发人员在Fork项目时难以掌握项目的Fork概况,不知道其他开发人员是否已通过Fork开展相同或类似的开发工作,从而容易产生重复性的贡献和冗余性开发。针对这个问题,提出一种Fork摘要的自动生成方法以帮助项目管理者加强项目管控,避免冗余贡献,增强合作交流。该方法首先爬取开源社区中具有Feature和Bug标签信息的Issue数据,采用随机森林方法训练一个分类器模型,以对Fork特征进行分类;随后收集Fork分支的软件开发活动数据,采用TextRank算法生成Fork详细信息以解释Fork的主要目的;最后设计了一组组合规则及相应的算法来整合Fork的类别、特征和其他信息,以形成完整的Fork摘要。为了检验所提方法在指导分布式协同开发方面的有效性,在Github上进行了30组人工测试和60组实际案例测试。结果表明,所提方法生成的Fork摘要的准确率达到67.2%,实验中76%的项目管理者认为Fork摘要有助于更好地管理项目,加强沟通与合作。
中图分类号:
[1]JIANG J,LO D,HE J,et al.Why and how developers fork what from whom in GitHub[J].Empirical Software Engineering,2016,22(1):1-32. [2]BITZER J,SCHRODER P.The Impact of Entry and Competition by Open Source Software on Innovation Activity[J].Industrial Organization,2005. [3]REN L,ZHOU S,KASTNER C,et al.Identifying Redundancies in Fork-based Development[C]∥2019 IEEE 26th International Conference on Software Analysis,Evolution and Reengineering (SANER).IEEE,2019. [4]YU Y,LI Z,YIN G,et al.A dataset of duplicate pull-requests in github∥Mining Software Repositories.2018:22-25. [5]GOUSIOS G,PINZGER M,VAN DEURSEN A,et al.An exploratory study of the pull-based software development model[C]∥International Conference on Software Engineering.2014:345-355. [6]REN L,ZHOU S,KÄSTNER C.Forks insight:providing an overview of GitHub forks[C]∥Proceedings of the 40th International Conference on Software Engineering:Companion Proceeedings.ACM,2018. [7]NYMAN L,MIKKONEN T.To Fork or Not to Fork:Fork Motivations in SourceForge Projects[C]∥Open Source Systems:Grounding Research - 7th IFIP WG 2.13 International Confe-rence(OSS 2011).DBLP,2011. [8]ZHOU S,STANCIULESCU S,LEBENICH O,et al.Identifying features in forks[C]∥International Conference on Software Engineering.2018:105-116. [9]YIN G,WANG T,LIU B X,et al.Survey of software data mi- ning for open source ecosystem[J].Journal of Software,2018,29(8):2258-2271. [10]SADOWSKI C,AFTANDILIAN E,EAGLE A,et al.Lessons from building static analysis tools at Google[J].Communications of the ACM,2018,61(4):58-66. [11]SALTON G,BUCKLEY C.Term-weighting approaches in automatic text retrieval[J].Information Processing and Management,1988,24(5):323-328. [12]JAMES G,WITTEN D,HASTIE T,et al.An Introduction to Statistical Learning[M].Springer New York,2013. [13]GOUSIOS G,ZAIDMAN A,STOREY M,et al.Work practices and challenges in pull-based development:the contributor’s perspective[C]∥International Conference on Software Enginee-ring.2015:285-296. [14]VASILESCU B,BLINCOE K,XUAN Q,et al.The sky is not the limit:multitasking across GitHub projects[C]∥Internatio-nal Conference on Software Engineering.2016:994-1005. [15]ROBLES,GREGORIO,GONZÁLEZBARAHONA J.A Com- prehensive Study of Software Forks:Dates,Reasons and Outcomes[C]∥Open Source System.2012:1-4. [16]LI L S,REN Z L,LI X C,et al.How are Issue Units Linked? Empirical Study on the PSECLinking Behavior in GitHub[C]∥Asia-Pacific Software Engineering Conference(APSEC).2018. [17]DABBISH L,STUART C,TSAY J,et al.Social coding in GitHub:transparency and collaboration in an open software repository[C]∥Conference on Computer Supported Cooperative Work.2012:1277-1286. [18]DABBISH L,STUART C,TSAY J,et al.Leveraging Transpa- rency[J].IEEE Software,2013,30(1):37-43. [19]Gail Cecile Murphy.Lightweight structural summarization as an aid to software evolution[OL]. https://core.ac.uk/display/20786603. [20]ZHU J,ZHOU M,MOCKUS A.Effectiveness of code contribution:from patch-based to pull-request-based tools[C]∥Acm Sigsoft International Symposium on Foundations of Software Engineering.ACM,2016. [21]POSHYVANYK D,MARCUS A.Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code[C]∥International Conference on Program Comprehension.IEEE Computer Society,2007:37-48. [22]STOREY M A D,CHENG L T,BULL R I,et al.Shared waypoints and social tagging to support collaboration in software development[C]∥Proceedings of the 2006 ACM Conference on Computer Supported Cooperative Work(CSCW 2006).Banff,Alberta,Canada,ACM,2006. [23]KUHN A,DUCASSE S,GIRBA T,et al.Semantic clustering:Identifying topics in source code[J].Information & Software Technology,2007,49(3):230-243. [24]STANCIULESCU S,SCHULZE S,WASOWSKI A.Forked and integrated variants in an open-source firmware project[C]∥2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).IEEE,2015. [25]KHATAVKAR V,KULKARNI P.Comparison of Support Vector Machines With and Without Latent Semantic Analysis for Document Classification ∥Data Management,Analytics and Innovation.2018:263--274. [26]LANDAUER T K.Latent Semantic Analysis[M]∥Encyclopedia of Cognitive Science.Berlin:Springer,2006. [27]BERGER T,NAIR D,RUBLACK R,et al.Three Cases of Feature-Based Variability Modeling in Industry[C]∥Model Driven Engineering Languages and Systems.2014:302-319. |
[1] | 范家宽, 王皓月, 赵生宇, 周添一, 王伟. 数据驱动的开源贡献度量化评估与持续优化方法 Data-driven Methods for Quantitative Assessment and Enhancement of Open Source Contributions 计算机科学, 2021, 48(5): 45-50. https://doi.org/10.11896/jsjkx.201000107 |
[2] | 何鹏, 喻绿君. 面向群体协作开发的开源软件峭壁分析 Analysis of Open Source Software Cliff Walls for Group Collaborative Development 计算机科学, 2020, 47(6): 51-58. https://doi.org/10.11896/jsjkx.190300140 |
[3] | 卢冬冬, 吴洁, 刘鹏, 盛永祥. 开源软件关键开发者类型及协作网络鲁棒性分析 Analysis of Key Developer Type and Robustness of Collaboration Network in Open Source Software 计算机科学, 2020, 47(12): 100-105. https://doi.org/10.11896/jsjkx.200300147 |
[4] | 王扩, 王忠杰. 众包协作流程的恢复方法 Crowdsourcing Collaboration Process Recovery Method 计算机科学, 2020, 47(10): 19-25. https://doi.org/10.11896/jsjkx.191200164 |
[5] | 陈丹,王星,何鹏,曾诚. 开源社区中已有开发者的合作行为分析 Towards Understanding Existing Developers’ Collaborative Behavior in OSS Communities 计算机科学, 2016, 43(Z6): 476-479. https://doi.org/10.11896/j.issn.1002-137X.2016.6A.112 |
[6] | 匡 立,易云飞,李元香. 基于弱连接理论的GitHub网络的分形特征分析 Analysis of Fractal Property on GitHub Network Based on Weak Ties Theory 计算机科学, 2015, 42(7): 146-149. https://doi.org/10.11896/j.issn.1002-137X.2015.07.032 |
[7] | 李其锋,李 兵. 开源软件开发者的演化研究 Evolution of Contributors in Open Source Software Development 计算机科学, 2015, 42(12): 43-46. |
[8] | 张浩斌. 基于开放式云平台的开源在线评测系统设计与实现 Design and Implementation of the Open Cloud Platform Based Open Source Online Judge System 计算机科学, 2012, 39(Z11): 339-343. |
[9] | 林利 石文昌. 构建云计算平台的开源软件综述 Survey of Open Source Software for Building Cloud Computing Platforms 计算机科学, 2012, 39(11): 1-7. |
[10] | 张锡哲,罗实,印莹,张斌. 面向软件执行网络的行为拓扑分析研究 Analysis on Dynamic Behavior for Open-source Software Execution Network 计算机科学, 2011, 38(Z10): 242-248. |
[11] | . 数据挖掘技术在软件工程中的应用综述 计算机科学, 2009, 36(5): 1-6. |
|