计算机科学 ›› 2022, Vol. 49 ›› Issue (12): 99-108.doi: 10.11896/jsjkx.220400289

• 计算机软件 • 上一篇    下一篇

开源社区众包任务的开发者推荐方法

蒋竞, 平源, 吴秋迪, 张莉   

  1. 北京航空航天大学计算机学院 北京100191
  • 收稿日期:2022-04-28 修回日期:2022-06-10 发布日期:2022-12-14
  • 通讯作者: 张莉(lily@buaa.edu.cn)
  • 作者简介:(jiangjing@buaa.edu.cn)
  • 基金资助:
    科技创新2030——“新一代人工智能”重大项目(2018AAA0102304);国家自然科学基金(62177003);中央高校基本科研业务费专项资金(YWF-20-BJ-J-1018)

Developer Recommendation Method for Crowdsourcing Tasks in Open Source Community

JIANG Jing, PING Yuan, WU Qiu-di, ZHANG Li   

  1. School of Computer Science and Engineering,Beihang University,Beijing 100191,China
  • Received:2022-04-28 Revised:2022-06-10 Published:2022-12-14
  • About author:JIANG Jing,born in 1985,Ph.D,asso-ciate professor.Her main research in-terests include intelligent software engineering,empirical software enginee-ring,open source software and software repository mining.ZHANG Li,born in 1968,Ph.D,professor.Her main research interests include software modeling and analysis,requirement engineering,empirical software engineering and software architecture.
  • Supported by:
    National Key Research and Development Program of China(2018AAA0102304), National Natural Science Foundation of China(62177003) and Fundamental Research Funds for the Central Universities of Ministry of Education of China(YWF-20-BJ-J-1018).

摘要: Gitcoin是一个基于开源社区GitHub的众包平台。在Gitcoin中,项目团队可以发布开发任务,开发者选择感兴趣的任务并注册,发布者选择合适的开发者完成任务并发放赏金。但是一些任务因缺乏注册者而失败,部分任务未能合格完成,顺利完成的任务也面临开发者注册间隔时间长的问题。因此,需要一种开发者推荐方法,快速为众包任务发现合适的开发人员,缩短开发者注册众包任务的时间,发现潜在合适的开发者并激励其注册,促进众包任务顺利完成。文中提出了一种基于LGBM分类算法的开发者推荐方法DEVRec(Developer Recommendation)。该方法提取任务特征、开发者特征、开发者和任务的关系特征,使用LGBM分类算法进行二分类,计算开发者注册任务的概率,最终得到众包任务的推荐人员列表。为了评估推荐效果,获取Gitcoin的1 599个已完成众包任务、343名任务发布者和1 605名开发者。实验结果显示,与对比方法Policy Model相比,DEVRec前1位、前3位、前5位和前10位推荐的准确度及MRR指标分别提高了73.11%,119.07%,86.55%,29.24%和62.27%。

关键词: 开源软件, 开发者推荐, 众包开发, 特征提取, 机器学习

Abstract: Gitcoin is a crowdsourcing platform based on open-source community GitHub.In Gitcoin,project teams can release development tasks.The developers select the task they are interested in to register,and the publisher selects the appropriate deve-loper to complete the task and offers a reward.But some tasks fail because of a lack of registrants.Some tasks are not performed properly.Successfully completed tasks also face the problem of long developer registration intervals.Therefore,a developer re-commendation method is needed to quickly find suitable developers for crowdsourcing tasks,shorten the time for developers to register for crowdsourcing tasks,find potential suitable developers and motivate them to register,so as to promote the successful completion of crowdsourcing tasks.A developer recommendation system DEVRec based on the LGBM classification algorithm is proposed in this paper.Firstly,the task-related characteristics,developer-related characteristics,and the relationship between developers and tasks in the crowd-sourcing task assignment records are extracted.Then the LGBM classification algorithm is used for binary classification.The probability of a developer registering the task is given,and finally the list of recommended people for the task is provided.To evaluate the recommendation effect,1 599 completed crowdsourcing tasks,343 publishers,and 1 605 deve-lopers are crawled from Gitcoin platform.Experimental results show that,compared with the Policy Model,the recommendation accuracy and MRR index of the top 1,top3,top5 and top10 of DEVRec improves by 73.11%,119.07%,86.55%,29.24% and 62.27% respectively.

Key words: Open-source software, Developer recommendation, Crowdsourcing development, Feature extraction, Machine learning

中图分类号: 

  • TP311
[1]DABBISH L,STUART C,TSAY J,et al.Social coding inGitHub:transparency and collaboration in an open software repository[C]//Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work.2012:1277-1286.
[2]DUBEY A,ABHINAV K,VIRDI G.A framework to preserve confidentiality in crowdsourced software development[C]//2017 IEEE/ACM 39th International Conference on Software Engineering Companion(ICSE-C).IEEE,2017:115-117.
[3]BAO L,XIA X,LO D,et al.A large scale study of long-timecontributor prediction for github projects[J].IEEE Transactions on Software Engineering,2019,47(6):1277-1298.
[4]WANG Q Y,XIA X,LO D,et al.Why Is My Code ChangeAbandoned?[J].Information and Software Technology,2019,110(JUN.):108-120.
[5]SAREMI R L,YANG Y E,RUHE G,et al.Leveraging crow-dsourcing for team elasticity:An empirical evaluation at Topco-der[C]//2017 IEEE/ACM 39th International Conference on Software Engineering:Software Engineering in Practice Track(ICSE-SEIP).IEEE,2017:103-112.
[6]ARCHAK N.Money,glory and cheap talk:analyzing strategic behavior of contestants in simultaneous crowdsourcing contests on TopCoder.com[C]//Proceedings of the 19th International Conference on World Wide Web.2010:21-30.
[7]SAXTON G D,OH O,KISHORE R.Rules of Crowdsourcing:Models,Issues,and Systems of Control[J].Information Systems Management,2013,30(1/2):2-20.
[8]RUI L L,ZHANG P,HUANG H Q,et al.A trust-based incentive mechanism for crowdsourcing [J].Journal of Electronics Information Technology,2016,38(7):1808-1815.
[9]SAXTON G D,OH O,KISHORE R.Rules of Crowdsourcing:Models,Issues,and Systems of Control[J].Information Systems Management,2013,30(1/2):2-20.
[10]HASTEER N,NAZIR N,BANSAL A,et al.CrowdsourcingSoftware Development:Many Benefits Many Concerns[J].Procedia Computer Science,2016,78:48-54.
[11]FU Y,SUN H L,YE L T.Competition-aware task routing for contest based crowdsourced software development[C]//2017 6th International Workshop on Software Mining(Software Mi-ning).IEEE,2017:32-39.
[12]JIANG J,WU Q,CAO J,et al.Recommending tags for pull requests in GitHub[J].Information and Software Technology,2021,129:106394.
[13]WANG Z Z,SUN H L,FU Y,et al.Recommending crowd- sourced software developersin consideration of skill improvement[C]//2017 32nd IEEE/ACM International Conference on Automated Software Engineering(ASE).IEEE,2017:717-722.
[14]BABA Y,KINOSHITA K,KASHIMA H.Participation recommendation system for crowdsourcing contests[J].Expert Systems with Applications,2016,58:174-183.
[15]JIANG J,LO D,ZHENG J T,et al.Who should make decision on this pull request? Analyzing time-decaying relationships and file similarities for integrator prediction[J].Journal of Systems and Software,2019,154:196-210.
[16]YANG Y.Code review decision maker recommendation and result prediction research [D].Beijing:BeiHang University,2018.
[17]MAO K,YANG Y,WANG Q,et al.Developer recommendation for crowdsourced software development tasks[C]//2015 IEEE Symposium on Service-Oriented System Engineering.IEEE,2015:347-356.
[18]ZHANG Z Y,SUN H L,ZHANG H Y.Developer recommendation for Topcoder through a meta-learning based policy model[J].Empirical Software Engineering,2020,25(1):859-889.
[19]YANG Y,KARIM M R,SAREMI R,et al.Who should take this task?dynamic decision support for crowd workers[C]//Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement.2016:1-10.
[20]JIANG J,YANG Y,HE J H,et al.Who should comment on this pull request? Analyzing attributes for more accurate commenter recommendation in pull-based development-Science Direct[J].Information & Software Technology,2017,84(C):48-62.
[21]HANNEBAUER C,PATALAS M,STÜNKEL S,et al.Automatically recommending code reviewers based on their exper-tise:An empiricalcomparison[C]//Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering.2016:99-110.
[22]BEGEL A,HERBSLEB J D,STOREY M A.The future of collaborative software development[C]//Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work Companion.2012:17-18.
[23]CUI C,HU M Q,WEIR J D,et al.A recommendation system for meta-modeling:A meta-learning based approach[J].Expert Systems with Applications,2016,46(Mar.):33-44.
[24]HAUFF C,GOUSIOS G.Matching GitHub developer profilesto job advertisements[C]//2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.IEEE,2015:362-366.
[25]WAN Y,CHEN L,XU G D,et al.SCSMiner:mining social co-ding sites for software developer recommendation with relevance propagation[J].World Wide Web,2018,21(6):1523-1543.
[26]GOUSIOS G,ZAIDMAN A,STOREY M A,et al.Work practices and challenges in pull-based development:the integrator’s perspective[C]//2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.IEEE,2015:358-368.
[27]ALELYANI T,YANG Y.Software crowdsourcing reliability:an empirical study on developers behavior[C]//Proceedings of the 2nd International Workshop on Software Analytics.2016:36-42.
[1] 冷典典, 杜鹏, 陈建廷, 向阳.
面向自动化集装箱码头的AGV行驶时间估计
Automated Container Terminal Oriented Travel Time Estimation of AGV
计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028
[2] 宁晗阳, 马苗, 杨波, 刘士昌.
密码学智能化研究进展与分析
Research Progress and Analysis on Intelligent Cryptology
计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053
[3] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[4] 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩.
基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究
Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network
计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094
[5] 张光华, 高天娇, 陈振国, 于乃文.
基于N-Gram静态分析技术的恶意软件分类研究
Study on Malware Classification Based on N-Gram Static Analysis Technology
计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203
[6] 张源, 康乐, 宫朝辉, 张志鸿.
基于Bi-LSTM的期货市场关联交易行为检测方法
Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM
计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304
[7] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[8] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[9] 陈明鑫, 张钧波, 李天瑞.
联邦学习攻防研究综述
Survey on Attacks and Defenses in Federated Learning
计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079
[10] 刘伟业, 鲁慧民, 李玉鹏, 马宁.
指静脉识别技术研究综述
Survey on Finger Vein Recognition Research
计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056
[11] 李亚茹, 张宇来, 王佳晨.
面向超参数估计的贝叶斯优化方法综述
Survey on Bayesian Optimization Methods for Hyper-parameter Tuning
计算机科学, 2022, 49(6A): 86-92. https://doi.org/10.11896/jsjkx.210300208
[12] 赵璐, 袁立明, 郝琨.
多示例学习算法综述
Review of Multi-instance Learning Algorithms
计算机科学, 2022, 49(6A): 93-99. https://doi.org/10.11896/jsjkx.210500047
[13] 肖治鸿, 韩晔彤, 邹永攀.
基于多源数据和逻辑推理的行为识别技术研究
Study on Activity Recognition Based on Multi-source Data and Logical Reasoning
计算机科学, 2022, 49(6A): 397-406. https://doi.org/10.11896/jsjkx.210300270
[14] 姚烨, 朱怡安, 钱亮, 贾耀, 张黎翔, 刘瑞亮.
一种基于异质模型融合的 Android 终端恶意软件检测方法
Android Malware Detection Method Based on Heterogeneous Model Fusion
计算机科学, 2022, 49(6A): 508-515. https://doi.org/10.11896/jsjkx.210700103
[15] 王飞, 黄涛, 杨晔.
基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究
Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion
计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!