计算机科学 ›› 2020, Vol. 47 ›› Issue (10): 26-31.doi: 10.11896/jsjkx.191100086
所属专题: 群智感知计算
张光园, 王宁
ZHANG Guang-yuan, WANG Ning
摘要: 众包工人的水平良莠不齐,质量控制是众包面临的挑战之一。目前的研究大多通过评估工人质量来保证最终答案的有效性,但是常常忽略众包任务中普遍存在的长尾现象。因此,综合考虑不同任务类型、长尾现象的特点以及工人完成任务的情况,提出构造小样本置信区间来估计工人质量,以解决工人完成任务数量普遍较少情况下的答案决策问题。首先依据黄金标准答案策略对工人质量进行预评估,根据工人质量分布分别对数值型任务和单项选择型任务采用不同的真值初始化方法;然后构造小样本置信区间以准确评估工人质量;最后进行任务答案决策并迭代更新工人质量。为了验证提出方法的有效性,实验在5个真实数据集上进行,与现有方法相比,所提方法能很好地解决长尾现象。特别是在工人完成任务数量普遍较少的情况下,提出的方法在单项选择型任务数据集中的平均准确率高达93%,相比现有方法的最好表现高出16%,且在数值型任务数据集中的MAE值和RMSE值均低于现有方法。
中图分类号:
[1]ZAIDAN O F,CALLISON-BURCH C.Crowdsourcing translation:Professional quality from non-professionals[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies.Association for Computational Linguistics,2011:1220-1229. [2]CHU X,MORCOS J,ILYAS I F,et al.Katara:A data cleaning system powered by knowledge bases and crowdsourcing[C]//Proceedings of the 2015 ACM SIGMOD International Confe-rence on Management of Data.ACM,2015:1247-1261. [3]ZHENG Y,LI G,LI Y,et al.Truth inference in crowdsourcing:Is the problem solved?[J].Proceedings of the VLDB Endowment,2017,10(5):541-552. [4]SHENG K,GU Z,MAO X,et al.Answer inference forcrowdsourcing based scoring[C]//2014 IEEE Global Communications Conference.IEEE,2014:2733-2738. [5]ZHI S,YANG F,ZHU Z,et al.Dynamic Truth Discovery on Numerical Data[C]//2018 IEEE International Conference on Data Mining (ICDM).IEEE,2018:817-826. [6]PARAMESWARAN A G,PARK H,GARCIA-MOLINA H,et al.Deco:declarative crowdsourcing[C]//Proceedings of the 21st ACM International Conference on Information and Knowledge Management.ACM,2012:1203-1212. [7]DAWID A P,SKENE A M.Maximum likelihood estimation of observer error-rates using the EM algorithm[J].Journal of the Royal Statistical Society:Series C (Applied Statistics),1979,28(1):20-28. [8]LI Q,LI Y,GAO J,et al.Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation[C]//Proceedings of the 2014 ACM SIGMOD International Confe-rence on Management of Data.ACM,2014:1187-1198. [9]LI Q,LI Y,GAO J,et al.A confidence-aware approach for truth discovery on long-tail data[J].Proceedings of the VLDB Endowment,2014,8(4):425-436. [10]XIAO H,GAO J,LI Q,et al.Towards confidence in the truth:A bootstrapping based truth discovery approach[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining.ACM,2016:1935-1944. [11]HUNG N Q V,TAM N T,TRAN L N,et al.An evaluation of aggregation techniques in crowdsourcing[C]//International Conference on Web Information Systems Engineering.Heidelberg:Springer,2013:1-15. [12]LI Y,LIU C,DU N,et al.Extracting medical knowledge from crowdsourced question answering website[J].IEEE Transactions on Big Data,2016:1-1. [13]BROWN L D,CAI T T,DASGUPTA A.Interval estimation for a binomial proportion[J].Statistical Science,2001,16(2):101-117. |
[1] | 傅彦铭, 朱杰夫, 蒋侃, 黄保华, 孟庆文, 周兴. 移动众包中基于多约束工人择优的激励机制研究 Incentive Mechanism Based on Multi-constrained Worker Selection in Mobile Crowdsourcing 计算机科学, 2022, 49(9): 275-282. https://doi.org/10.11896/jsjkx.210700129 |
[2] | 严磊, 张功萱, 王添, 寇小勇, 王国洪. 混合云下具有交付期约束的众包任务调度算法 Scheduling Algorithm for Bag-of-Tasks with Due Date Constraints on Hybrid Clouds 计算机科学, 2022, 49(5): 244-249. https://doi.org/10.11896/jsjkx.210300120 |
[3] | 阳真, 黄松, 郑长友. 基于区块链与改进CP-ABE的众测知识产权保护技术研究 Study on Crowdsourced Testing Intellectual Property Protection Technology Based on Blockchain and Improved CP-ABE 计算机科学, 2022, 49(5): 325-332. https://doi.org/10.11896/jsjkx.210900075 |
[4] | 陈丹红, 彭张林, 万德全, 杨善林. 众包平台用户价值识别与细分:基于改进的RFM模型 Identification and Segmentation of User Value in Crowdsourcing Platforms:An Improved RFMModel 计算机科学, 2022, 49(4): 37-42. https://doi.org/10.11896/jsjkx.210800255 |
[5] | 沈彪, 沈立炜, 李弋. 空间众包任务的路径动态调度方法 Dynamic Task Scheduling Method for Space Crowdsourcing 计算机科学, 2022, 49(2): 231-240. https://doi.org/10.11896/jsjkx.210400249 |
[6] | 韩丽霞, 张占营. 基于树增益朴素贝叶斯网络的服务定价策略 TAN-based Service Pricing Strategy 计算机科学, 2021, 48(6A): 203-. https://doi.org/10.11896/jsjkx.200900024 |
[7] | 张少杰, 鹿旭东, 郭伟, 王世鹏, 何伟. 供需匹配中的非诚信行为预防 Prevention of Dishonest Behavior in Supply-Demand Matching 计算机科学, 2021, 48(4): 303-308. https://doi.org/10.11896/jsjkx.200900090 |
[8] | 赵杨, 倪志伟, 朱旭辉, 刘浩, 冉家敏. 基于改进狮群进化算法的面向空间众包平台的多工作者多任务路径规划方法 Multi-worker and Multi-task Path Planning Based on Improved Lion Evolutionary Algorithm forSpatial Crowdsourcing Platform 计算机科学, 2021, 48(11A): 30-38. https://doi.org/10.11896/jsjkx.201200085 |
[9] | 李玉, 段宏岳, 殷昱煜, 高洪皓. 基于区块链的去中心化众包技术综述 Survey of Crowdsourcing Applications in Blockchain Systems 计算机科学, 2021, 48(11): 12-27. https://doi.org/10.11896/jsjkx.210600152 |
[10] | 唐文君,张佳丽,陈荣,郭世凯. 基于强化学习的Web服务众测任务分派方法 Web Service Crowdtesting Task Assignment Approach Based onReinforcement Learning 计算机科学, 2020, 47(3): 54-60. https://doi.org/10.11896/jsjkx.191100085 |
[11] | 余敦辉, 成涛, 袁旭. 基于排序学习的软件众包任务推荐算法 Software Crowdsourcing Task Recommendation Algorithm Based on Learning to Rank 计算机科学, 2020, 47(12): 106-113. https://doi.org/10.11896/jsjkx.200300107 |
[12] | 王扩, 王忠杰. 众包协作流程的恢复方法 Crowdsourcing Collaboration Process Recovery Method 计算机科学, 2020, 47(10): 19-25. https://doi.org/10.11896/jsjkx.191200164 |
[13] | 胡颖, 王莹洁, 童向荣. 基于众包工人移动轨迹的任务推荐模型 Task Recommendation Model Based on Crowd Worker’s Movement Trajectory 计算机科学, 2020, 47(10): 32-40. https://doi.org/10.11896/jsjkx.200600180 |
[14] | 吕佳高,梁奎阳,蔡伟. 基于文献计量和众包技术的前沿科技关键词挖掘 Frontier Scientific Keyword Extraction Based on Bibliometric and Crowdsourcing 计算机科学, 2019, 46(3): 275-282. https://doi.org/10.11896/j.issn.1002-137X.2019.03.041 |
[15] | 侯禹臣, 吴伟. 静态图像行为标注众包系统的设计与实现 Design and Implementation of Crowdsourcing System for Still Image Activity Annotation 计算机科学, 2019, 46(11A): 580-583. |
|