计算机科学 ›› 2020, Vol. 47 ›› Issue (10): 19-25.doi: 10.11896/jsjkx.191200164

所属专题: 群智感知计算

• 群智感知计算 • 上一篇    下一篇

众包协作流程的恢复方法

王扩, 王忠杰   

  1. 哈尔滨工业大学计算机学院企业与服务智能计算研究中心 哈尔滨150001
  • 收稿日期:2019-12-27 修回日期:2020-05-08 出版日期:2020-10-15 发布日期:2020-10-16
  • 通讯作者: 王忠杰 (rainy.wang@gmail.com)
  • 作者简介:hitwangkuo@sina.com
  • 基金资助:
    国家自然科学基金(61772155)

Crowdsourcing Collaboration Process Recovery Method

WANG Kuo, WANG Zhong-jie   

  1. Research Center on Intelligent Computing for Enterprises & Services,Harbin Institute of Technology,Harbin 150001,China
  • Received:2019-12-27 Revised:2020-05-08 Online:2020-10-15 Published:2020-10-16
  • About author:WANG Kuo,born in 1990,Ph.D,is a member of China Computer Federation.His main research interests include social software engineering and crowdsourcing,software warehouse mining and service recommendation.
    WANG Zhong-jie,born in 1978,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include service computing,service engineering Internet services and cloud computing,social networking services,social software engineering and crowdsourcing,software warehouse mining.
  • Supported by:
    National Natural Science Foundation of China (61772155)

摘要: 众包是一种应用群体智慧的分布式问题求解机制,目前广泛存在于以人工智力活动为基础的互联网应用场景中,利用互联网上大量用户的群体协作来解决单人无法解决的复杂问题。众包协作机制对开源领域的发展起到了很大的作用。以开源软件的开发维护过程为例,参与人员通过特定平台共同完成代码编写、bug修复等关键任务。与传统业务过程管理(Business Process Management,BPM)不同,众包场景下的协作流程存在流程结构无法预先确定、协作参与者数量无法预知、协作时间与结果无法提前预测等挑战,这给众包协作的效率与质量控制带来了极大的困难。针对众包协作过程中多个参与者按时间次序产生的一系列协作行为(体现为自然语言形式的文本),利用自然语言处理和人工智能等方法,提出了众包协作过程恢复算法,并以开源软件开发领域bug修复过程中的人员合作为案例进行了实证研究,尝试用3种方法对协作流程进行恢复,分别是文本近似度、关键词汇匹配以及神经网络意图理解恢复算法;然后定量对比了各个流程恢复算法的准确度,得出应用关键词匹配算法进行协作流程恢复的准确度最高、效果最好的结论;最后实现将需要分析的协作流程进行协作流程恢复以及可视化的工作。该研究有助于众包流程的协调者(例如开源项目管理者)更直观地理解众包协作中的问题求解过程,从中发现协作的典型模式,从而可为新的众包任务的协作过程的性质作出准确预测。

关键词: Bug修复, 开源软件开发, 流程恢复, 协作流程, 众包

Abstract: Crowdsourcing is a distributed problem solving mechanism using group intelligence.It is widely used in Internet application scenarios based on artificial intelligence activities,using large groups of users on the Internet to work together to solve complex problems that cannot be solved by one person.Taking the development and maintenance process of open source software as an example,participants jointly complete key tasks such as code writing and bug repair through specific platforms.Different from traditional business process management (BPM),collaborative processes in the crowdsourcing scenario face challenges such as undetermined process structure,and unpredictable timing and results,which bring great difficulties to the efficiency and quality control of crowdsourcing collaboration.In this paper,aiming at a series of collaborative behaviors produced by multiple participants according to the time sequence (embodied as text in the form of natural language),natural language processing and artificial intelligence are used to propose a restoration algorithm for the process of crowdsourcing collaboration.An empirical study is carried out on the case of personnel cooperation in the process of bug repair in the field of open source software development.The collaborative process of recovery is visualized,and the accuracy of process recovery algorithm is quantitatively compared.This research can help coordinators of crowdsourcing process (such as open source project managers) to understand the problem solving process more intuitively,and find the typical patterns of collaboration,so as to make an accurate prediction for the nature of the collaborative process of the new crowdsourcing task.

Key words: Bug fix, Collaborative process, Crowdsourcing, Open source software development, Process restore

中图分类号: 

  • TP315
[1]HOWE J.The rise of crowdsourcing [J].Wired Magazine,2006,14(6):1-4.
[2]ZHAO Y X,ZHU Q H.Evaluation on crowdsourcing research: current status and future direction [J].Information Systems Frontiers,2012,11(1):1-18.
[3]KOCH G,FULLER J,BRUNSWICKER S.Online crowdsourcing in the public sector: how to design open government platforms [C]//Proceedings of The 4th International Conference on Online Communities and Social Computing.Orlando,USA,2011: 203-212.
[4]KHEER J,BOSTOCK M.Crowdsourcing graphical perception: Using mechanical turk to assess visualization design[C]//Proceedings of the 28th International Conference on Human Factors in Computing Systems.Atlanta,USA,2010:203-212.
[5]PARAMESWARAN A G,GARCIA-MOLINA H,PARK H,et al.CrowdScreen:Algorithms for filtering data with humans[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data.Scottsdale,USA,2012:361-372.
[6]VENETIS P,GARCIA-MOLINA H,HUANG K,et al.Maxalgorithms in crowdsourcing environments[C]//Proceedings of the 21st World Wide Web Conference.Lyon,France,2012:989-998.
[7]LAWS F,SCHEIBLE C,SCHUTZE H.Active learning withamazon mechanical turk[C]//Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing.Edinburgh,UK,2011:1546-1556.
[8]LIU X,LU M,OOI B,et al.CDAS:A crowdsourcing data analytics system[J].Proceedings of the VLDB Endowment,2012,5(10):1040-1051.
[9]KAZAI G.In search of quality in crowdsourcing for search engine evaluation[C]//Proceedings of the 33rd European Conference on IR Research.Dublin,Ireland,2011:165-176.
[10]CHAWLA S,HARTLINE J D,SIVAN B.Optimal Crowdsourcing contest[C]//Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms.Kyoto,Japan,2012:856-868.
[11]KITTUR A,NICKERSON J V,BERNSTEIN M S,et al.The future of crowd work[C]//Proceedings of the 2013 ACM Conference on Computer Supported Cooperative Work.San Antonio,USA,2013:1301-1318.
[12]DOAN A,RAMAKRICHNAN R,HALEVY A Y.Crowdsourcing systems on the world-wide web [J].Communications of the ACM,2011,54(4):86-96.
[13]ANJALI G,NEETU S.An empirical study of non-reproducible bugs[J].International Journal of System Assurance Engineering and Management,2019,10(5):1186-1220.
[14]ZHAO Y,HE T K,CHEN Z Y.A Unified Framework for Bug Report Assignment[J].International Journal of Software Engineering and Knowledge Engineering,2019,29(4):607-628.
[15]HUI L,GAO G F,CHEN R,et al.The Influence Ranking forTesters in Bug Tracking Systems[J].International Journal of Software Engineering and Knowledge Engineering,2019,29(1):93-113.
[16]PRESSMAN R S,INCE D.Software engineering:a practitioner’sapproach[M].New York:McGraw-hill,1992.
[17]XIE T,PEI J,HASSAN A E.Mining software engineering data [C]//Proceedings of the 29th International Conference on Software Engineering.Minnesota.USA,2007:172-173.
[18]ZHOU J,ZHANG H Y,LO D.where should the bugs be fixed?more accurate information retrieval-based bug localization based on bug reports [C]//the 34th International Conference on Software Engineering.Switzerland,2012:14-24.
[1] 傅彦铭, 朱杰夫, 蒋侃, 黄保华, 孟庆文, 周兴.
移动众包中基于多约束工人择优的激励机制研究
Incentive Mechanism Based on Multi-constrained Worker Selection in Mobile Crowdsourcing
计算机科学, 2022, 49(9): 275-282. https://doi.org/10.11896/jsjkx.210700129
[2] 严磊, 张功萱, 王添, 寇小勇, 王国洪.
混合云下具有交付期约束的众包任务调度算法
Scheduling Algorithm for Bag-of-Tasks with Due Date Constraints on Hybrid Clouds
计算机科学, 2022, 49(5): 244-249. https://doi.org/10.11896/jsjkx.210300120
[3] 阳真, 黄松, 郑长友.
基于区块链与改进CP-ABE的众测知识产权保护技术研究
Study on Crowdsourced Testing Intellectual Property Protection Technology Based on Blockchain and Improved CP-ABE
计算机科学, 2022, 49(5): 325-332. https://doi.org/10.11896/jsjkx.210900075
[4] 陈丹红, 彭张林, 万德全, 杨善林.
众包平台用户价值识别与细分:基于改进的RFM模型
Identification and Segmentation of User Value in Crowdsourcing Platforms:An Improved RFMModel
计算机科学, 2022, 49(4): 37-42. https://doi.org/10.11896/jsjkx.210800255
[5] 沈彪, 沈立炜, 李弋.
空间众包任务的路径动态调度方法
Dynamic Task Scheduling Method for Space Crowdsourcing
计算机科学, 2022, 49(2): 231-240. https://doi.org/10.11896/jsjkx.210400249
[6] 韩丽霞, 张占营.
基于树增益朴素贝叶斯网络的服务定价策略
TAN-based Service Pricing Strategy
计算机科学, 2021, 48(6A): 203-. https://doi.org/10.11896/jsjkx.200900024
[7] 张少杰, 鹿旭东, 郭伟, 王世鹏, 何伟.
供需匹配中的非诚信行为预防
Prevention of Dishonest Behavior in Supply-Demand Matching
计算机科学, 2021, 48(4): 303-308. https://doi.org/10.11896/jsjkx.200900090
[8] 赵杨, 倪志伟, 朱旭辉, 刘浩, 冉家敏.
基于改进狮群进化算法的面向空间众包平台的多工作者多任务路径规划方法
Multi-worker and Multi-task Path Planning Based on Improved Lion Evolutionary Algorithm forSpatial Crowdsourcing Platform
计算机科学, 2021, 48(11A): 30-38. https://doi.org/10.11896/jsjkx.201200085
[9] 李玉, 段宏岳, 殷昱煜, 高洪皓.
基于区块链的去中心化众包技术综述
Survey of Crowdsourcing Applications in Blockchain Systems
计算机科学, 2021, 48(11): 12-27. https://doi.org/10.11896/jsjkx.210600152
[10] 唐文君,张佳丽,陈荣,郭世凯.
基于强化学习的Web服务众测任务分派方法
Web Service Crowdtesting Task Assignment Approach Based onReinforcement Learning
计算机科学, 2020, 47(3): 54-60. https://doi.org/10.11896/jsjkx.191100085
[11] 余敦辉, 成涛, 袁旭.
基于排序学习的软件众包任务推荐算法
Software Crowdsourcing Task Recommendation Algorithm Based on Learning to Rank
计算机科学, 2020, 47(12): 106-113. https://doi.org/10.11896/jsjkx.200300107
[12] 张光园, 王宁.
基于小样本置信区间的众包答案决策方法
Truth Inference Based on Confidence Interval of Small Samples in Crowdsourcing
计算机科学, 2020, 47(10): 26-31. https://doi.org/10.11896/jsjkx.191100086
[13] 胡颖, 王莹洁, 童向荣.
基于众包工人移动轨迹的任务推荐模型
Task Recommendation Model Based on Crowd Worker’s Movement Trajectory
计算机科学, 2020, 47(10): 32-40. https://doi.org/10.11896/jsjkx.200600180
[14] 吕佳高,梁奎阳,蔡伟.
基于文献计量和众包技术的前沿科技关键词挖掘
Frontier Scientific Keyword Extraction Based on Bibliometric and Crowdsourcing
计算机科学, 2019, 46(3): 275-282. https://doi.org/10.11896/j.issn.1002-137X.2019.03.041
[15] 侯禹臣, 吴伟.
静态图像行为标注众包系统的设计与实现
Design and Implementation of Crowdsourcing System for Still Image Activity Annotation
计算机科学, 2019, 46(11A): 580-583.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!