计算机科学 ›› 2024, Vol. 51 ›› Issue (10): 187-195.doi: 10.11896/jsjkx.230900071
夏小雅1, 赵生宇2, 韩凡宇1, 毕枫林1, 王伟1, 周烜1, 周傲英1
XIA Xiaoya1, ZHAO Shengyu2, HAN Fanyu1, BI Fenglin1, WANG Wei1, ZHOU Xuan1, ZHOU Aoying1
摘要: 开源软件在大规模发展与普及的同时也构筑了一个开源开发与协同的生态系统,在这个系统中,个人与组织协同开发所有人都可以使用的高质量软件。以GitHub为代表的社会化协作平台进一步促进了大规模、分布式、细粒度的代码协作与技术社交,无数开发者每天在其上提交代码、评审代码、报告bug,或提出新的功能请求,如何利用这些海量的协作行为数据挖掘有价值的信息是当前的研究难点。因此,设计并实现了一个面向开源协作数字生态的一站式数据挖掘系统OpenDigger,目标是构建开源领域的数据基础设施,促进开源生态的持续发展。OpenDigger系统主要由数据采集服务、数据存储模块、标签数据模块和信息服务模块构成,它基于OLAP列式数据库和图数据库,持续采集多源开源生态数据,并通过统一的接口为不同用户群体提供各类开源信息服务。OpenDigger从协作关系网络视角挖掘开源数字生态中的关键信息,相比传统统计指标,协作网络视角更好地展现了开源项目与开发者的关联特性,用户可以使用在线分析环境或CLI工具对开源生态数据进行建模与分析。OpenDigger服务于蚂蚁金服、阿里巴巴、木兰开源社区等多家企业与社区,为OSPO(Open Source Program Office,开源办公室)从业者和开源项目运营负责人提供开源数字洞察能力。
中图分类号:
[1]ZHOU M H,ZHANG Y X,TAN X.Software Digital Sociology[J].Chinese Science:Information Science,2019(11):1399-1411. [2]WALKER G H,STANTON N A,SALMON P M,et al.A review of sociotechnical systems theory:a classic concept for new command and control paradigms[J].Theoretical Issues in Ergonomics Science,2008,9(6):479-499. [3]ROPOHL G.Philosophy of socio-technical systems[J].Societyfor Philosophy and Technology Quarterly Electronic Journal,1999,4(3):186-194. [4]CHUNG F R K,LU L.Complex graphs and networks[M].American Mathematical Soc.,2006. [5]MA Y,BOGART C,AMREEN S,et al.World of code:an infra-structure for mining the universe of open source VCS data[C]//2019 IEEE/ACM 16th International Conference on Mining Software Repositories(MSR).IEEE,2019:143-154. [6]DROST-FROMM I,TOMPKINS R.Open Source CommunityGovernance the Apache Way[J].Computer,2021,54(4):70-75. [7]YUAN L,WANG H M,YIN G,et al.Mining and analyzing behavioral characteristic of developers in open source software[J].Journal of Computers,2010,33(10):1909-1918. [8]LI C Y,HONG M.Analysis on Behavior Characteristics of De-velopers in Github[J].Computer Science,2019,46(2):152-158. [9]CONSTANTINO K,SOUZA M,ZHOU S,et al.Perceptions of open-source software developers on collaborations:An interview and survey study[J].Journal of Software:Evolution and Process,2023,35(5):e2393. [10]MARLOW J,DABBISH L,HERBSLEB J.Impression formation in online peer production:activity traces and personal profiles in github[C]//Proceedings of the 2013 Conference on Computer Supported Cooperative Work.2013:117-128. [11]TSAY J,DABBISH L,HERBSLEB J.Influence of social andtechnical factors for evaluating contribution in GitHub[C]//Proceedings of the 36th International Conference on Software Engineering.2014:356-366. [12]MCDONALD N,GOGGINS S.Performance and participation in open source software on github[M]//CHI'13 Extended Abstracts on Human Factors in Computing Systems.2013:139-144. [13]DAI L C,DAI X,CUI Y,et al.Anomaly data mining algorithm in social network based on deep integrated learning[J].Journal of Jilin University(Engineering and Technology Edition),2022,52(11):2712-2717. [14]LIU P,ZHANG P C,WANG N X.Structure and Evolution of Developer Collaboration Network in Cloud Foundry OSS Community[J].Complex Systems and Complexity Science,2020,16(4):31-43. [15]YIN G,WANG T,LIU B X,et al.Survey of Software Data Mi-ning for Open Source Ecosystem[J].Journal of Software,2018,29(8):2258-2271. [16]SAMOLADAS I,GOUSIOS G,SPINELLIS D,et al.The SQO-OSS quality model:measurement based open source software evaluation[C]//Open Source Development,Communities and Quality:IFIP 20 th World Computer Congress,Working Group 2.3 on Open Source Software.2008:237-248. [17]BAUER V,HEINEMANN L,HUMMEL B,et al.A framework for incremental quality analysis of large software systems[C]//2012 28th IEEE International Conference on Software Maintenance(ICSM).IEEE,2012:537-546. [18]ZOU Y,LIU C,JIN Y,et al.Assessing software quality through web comment search and analysis[C]//13th International Conference on Software Reuse.Springer,2013:208-223. [19]ALLAMANIS M,SUTTON C.Why,when,and what:analyzing stack overflow questions by topic,type,and code[C]//2013 10th Working Conference on Mining Software Repositories(MSR).IEEE,2013:53-56. [20]HENβ S,MONPERRUS M,MEZINI M.Semi-automatically ex-tracting FAQs to improve accessibility of software development knowledge[C]//2012 34th International Conference on Software Engineering(ICSE).IEEE,2012:793-803. [21]WONG E,YANG J,TAN L.Autocomment:Mining questionand answer sites for automatic comment generation[C]//2013 28th IEEE/ACM International Conference on Automated Software Engineering(ASE).IEEE,2013:562-567. [22]DAGENAIS B,ROBILLARD M P.Recovering traceability links between an API and its learning resources[C]//2012 34th International Conference on Software Engineering(ICSE).IEEE,2012:47-57. [23]BACCHELLI A,PONZANELLI L,LANZA M.Harnessingstack overflow for the ide[C]//2012 Third International Workshop on Recommendation Systems for Software Engineering(RSSE).IEEE,2012:26-30. [24]CHAUDHURI S,DAYAL U.An overview of data warehousing and OLAP technology[J].ACM Sigmod record,1997,26(1):65-74. [25]ANGLES R.A comparison of current graph database models[C]//2012 IEEE 28th International Conference on Data Engineering Workshops.IEEE,2012:171-177. [26]MILLER J J.Graph database applications and concepts withNeo4j[C]//Proceedings of theSouthern Association for Information Systems Conference.2013,2324(36):141-147. [27]ASRATIAN A S,DENLEY T M J,HÄGGKVIST R.Bipartitegraphs and their applications[M].Cambridge University Press,1998. [28]XIA X,WENG Z,WANG W,et al.Exploring activity and contributors on GitHub:Who,what,when,and where[C]//2022 29th Asia-Pacific Software Engineering Conference(APSEC).IEEE,2022:11-20. [29]XING W,GHORBANI A.Weighted pagerank algorithm[C]//Proceedings of Second Annual Conference on Communication Networks and Services Research.IEEE,2004:305-314. [30]DABBISH L,STUART C,TSAY J,et al.Social coding inGitHub:transparency and collaboration in an open software repository[C]//Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work.2012:1277-1286. |
|