Computer Science ›› 2020, Vol. 47 ›› Issue (3): 25-33.doi: 10.11896/jsjkx.191000087

Special Issue: Intelligent Software Engineering

• Intelligent Software Engineering • Previous Articles     Next Articles

Approach of Automatic Fork Summary Generation in Open Source Community Based on Feature Extraction

ZHANG Chao1,MAO Xin-jun1,2,LU Yao1   

  1. (College of Computer Science and Technology, National University of Defense Technology, Changsha 410000, China)1;
    (Key Laboratory of Complex System Software Engineering, Changsha 410000, China)2
  • Received:2019-10-15 Online:2020-03-15 Published:2020-03-30
  • About author:ZHANG Chao,born in 1991,postgradua-te,is member of China Computer Fe-deration.His main research interests include software engineering and open source community. MAO Xin-jun,born in 1970,Ph.D,professor,is member of China Computer Federation.His main research interests include software engineering and open source community.
  • Supported by:
    This work was supported by the National Key R&D Program of China (2018YFB1004202) and Research on Mechanism and Method of Massive Online Collaborative Learning (61532004).

Abstract: At present,distributed collaborative development based on P/R has become the dominant software development me-thod in open source community.Because of the openness,transparency and parallelism of the software development in P/R mo-del,it is difficult for developers to obtain the complete Fork profile of the whole project,and know whether other developers have accomplished the same or similar development tasks,which are prone to duplicate contributions and redundant development.To solve this problem,this paper proposed an automatic generation method of Fork summary to help project managers strengthen project management,avoid redundant contributions,and enhance cooperation and communication among developers.The proposed method firstly crawls Issue data with feature and Bug label information in open source community,and trains a classifier model with random forest method to classify Fork features.Then,it collects the data of Fork branch’s software development activities and uses TextRank algorithm to generate detailed Fork information to explain the main purpose of Fork activity.Finally,a set of combination rules and corresponding algorithm are designed to integrate Fork’s categories,features and other information to form a complete Fork summary.In order to validate the effectiveness of the proposed method,30 groups of manual tests and 60 groups of actual live study were conducted on Github.The results show that the accuracy of Fork summary generated by this method is 67.2%.In the experiment,76% of project managers believe that Fork summary can help to better manage projects,and strengthen communication and cooperation.

Key words: Distributed cooperative development, Fork summary, Open source community, Opens source

CLC Number: 

  • TP311
[1]JIANG J,LO D,HE J,et al.Why and how developers fork what from whom in GitHub[J].Empirical Software Engineering,2016,22(1):1-32.
[2]BITZER J,SCHRODER P.The Impact of Entry and Competition by Open Source Software on Innovation Activity[J].Industrial Organization,2005.
[3]REN L,ZHOU S,KASTNER C,et al.Identifying Redundancies in Fork-based Development[C]∥2019 IEEE 26th International Conference on Software Analysis,Evolution and Reengineering (SANER).IEEE,2019.
[4]YU Y,LI Z,YIN G,et al.A dataset of duplicate pull-requests in github∥Mining Software Repositories.2018:22-25.
[5]GOUSIOS G,PINZGER M,VAN DEURSEN A,et al.An exploratory study of the pull-based software development model[C]∥International Conference on Software Engineering.2014:345-355.
[6]REN L,ZHOU S,KÄSTNER C.Forks insight:providing an overview of GitHub forks[C]∥Proceedings of the 40th International Conference on Software Engineering:Companion Proceeedings.ACM,2018.
[7]NYMAN L,MIKKONEN T.To Fork or Not to Fork:Fork Motivations in SourceForge Projects[C]∥Open Source Systems:Grounding Research - 7th IFIP WG 2.13 International Confe-rence(OSS 2011).DBLP,2011.
[8]ZHOU S,STANCIULESCU S,LEBENICH O,et al.Identifying features in forks[C]∥International Conference on Software Engineering.2018:105-116.
[9]YIN G,WANG T,LIU B X,et al.Survey of software data mi- ning for open source ecosystem[J].Journal of Software,2018,29(8):2258-2271.
[10]SADOWSKI C,AFTANDILIAN E,EAGLE A,et al.Lessons from building static analysis tools at Google[J].Communications of the ACM,2018,61(4):58-66.
[11]SALTON G,BUCKLEY C.Term-weighting approaches in automatic text retrieval[J].Information Processing and Management,1988,24(5):323-328.
[12]JAMES G,WITTEN D,HASTIE T,et al.An Introduction to Statistical Learning[M].Springer New York,2013.
[13]GOUSIOS G,ZAIDMAN A,STOREY M,et al.Work practices and challenges in pull-based development:the contributor’s perspective[C]∥International Conference on Software Enginee-ring.2015:285-296.
[14]VASILESCU B,BLINCOE K,XUAN Q,et al.The sky is not the limit:multitasking across GitHub projects[C]∥Internatio-nal Conference on Software Engineering.2016:994-1005.
[15]ROBLES,GREGORIO,GONZÁLEZBARAHONA J.A Com- prehensive Study of Software Forks:Dates,Reasons and Outcomes[C]∥Open Source System.2012:1-4.
[16]LI L S,REN Z L,LI X C,et al.How are Issue Units Linked? Empirical Study on the PSECLinking Behavior in GitHub[C]∥Asia-Pacific Software Engineering Conference(APSEC).2018.
[17]DABBISH L,STUART C,TSAY J,et al.Social coding in GitHub:transparency and collaboration in an open software repository[C]∥Conference on Computer Supported Cooperative Work.2012:1277-1286.
[18]DABBISH L,STUART C,TSAY J,et al.Leveraging Transpa- rency[J].IEEE Software,2013,30(1):37-43.
[19]Gail Cecile Murphy.Lightweight structural summarization as an aid to software evolution[OL].
[20]ZHU J,ZHOU M,MOCKUS A.Effectiveness of code contribution:from patch-based to pull-request-based tools[C]∥Acm Sigsoft International Symposium on Foundations of Software Engineering.ACM,2016.
[21]POSHYVANYK D,MARCUS A.Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code[C]∥International Conference on Program Comprehension.IEEE Computer Society,2007:37-48.
[22]STOREY M A D,CHENG L T,BULL R I,et al.Shared waypoints and social tagging to support collaboration in software development[C]∥Proceedings of the 2006 ACM Conference on Computer Supported Cooperative Work(CSCW 2006).Banff,Alberta,Canada,ACM,2006.
[23]KUHN A,DUCASSE S,GIRBA T,et al.Semantic clustering:Identifying topics in source code[J].Information & Software Technology,2007,49(3):230-243.
[24]STANCIULESCU S,SCHULZE S,WASOWSKI A.Forked and integrated variants in an open-source firmware project[C]∥2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).IEEE,2015.
[25]KHATAVKAR V,KULKARNI P.Comparison of Support Vector Machines With and Without Latent Semantic Analysis for Document Classification ∥Data Management,Analytics and Innovation.2018:263--274.
[26]LANDAUER T K.Latent Semantic Analysis[M]∥Encyclopedia of Cognitive Science.Berlin:Springer,2006.
[27]BERGER T,NAIR D,RUBLACK R,et al.Three Cases of Feature-Based Variability Modeling in Industry[C]∥Model Driven Engineering Languages and Systems.2014:302-319.
[1] XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[2] WANG Zi-kai, ZHU Jian, ZHANG Bo-jun, HU Kai. Research and Implementation of Parallel Method in Blockchain and Smart Contract [J]. Computer Science, 2022, 49(9): 312-317.
[3] ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112.
[4] XIONG Luo-geng, ZHENG Shang, ZOU Hai-tao, YU Hua-long, GAO Shang. Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism [J]. Computer Science, 2022, 49(7): 212-219.
[5] PAN Zhi-yong, CHENG Bao-lei, FAN Jian-xi, BIAN Qing-rong. Algorithm to Construct Node-independent Spanning Trees in Data Center Network BCDC [J]. Computer Science, 2022, 49(7): 287-296.
[6] LI Tang, QIN Xiao-lin, CHI He-yu, FEI Ke. Secure Coordination Model for Multiple Unmanned Systems [J]. Computer Science, 2022, 49(7): 332-339.
[7] HUANG Jue, ZHOU Chun-lai. Frequency Feature Extraction Based on Localized Differential Privacy [J]. Computer Science, 2022, 49(7): 350-356.
[8] YE Yue-jin, LI Fang, CHEN De-xun, GUO Heng, CHEN Xin. Study on Preprocessing Algorithm for Partition Reconnection of Unstructured-grid Based on Domestic Many-core Architecture [J]. Computer Science, 2022, 49(6): 73-80.
[9] ZHAO Jing-wen, FU Yan, WU Yan-xia, CHEN Jun-wen, FENG Yun, DONG Ji-bin, LIU Jia-qi. Survey on Multithreaded Data Race Detection Techniques [J]. Computer Science, 2022, 49(6): 89-98.
[10] CHEN Xin, LI Fang, DING Hai-xin, SUN Wei-ze, LIU Xin, CHEN De-xun, YE Yue-jin, HE Xiang. Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture [J]. Computer Science, 2022, 49(6): 99-107.
[11] WANG Yi, LI Zheng-hao, CHEN Xing. Recommendation of Android Application Services via User Scenarios [J]. Computer Science, 2022, 49(6A): 267-271.
[12] FU Li-yu, LU Ge-hao, WU Yi-ming, LUO Ya-ling. Overview of Research and Development of Blockchain Technology [J]. Computer Science, 2022, 49(6A): 447-461.
[13] JIANG Cheng-man, HUA Bao-jian, FAN Qi-liang, ZHU Hong-jun, XU Bo, PAN Zhi-zhong. Empirical Security Study of Native Code in Python Virtual Machines [J]. Computer Science, 2022, 49(6A): 474-479.
[14] YUAN Hao-nan, WANG Rui-jin, ZHENG Bo-wen, WU Bang-yan. Design and Implementation of Cross-chain Trusted EMR Sharing System Based on Fabric [J]. Computer Science, 2022, 49(6A): 490-495.
[15] CHEN Jun-wu, YU Hua-shan. Strategies for Improving Δ-stepping Algorithm on Scale-free Graphs [J]. Computer Science, 2022, 49(6A): 594-600.
Full text



No Suggested Reading articles found!