Computer Science ›› 2020, Vol. 47 ›› Issue (10): 26-31.doi: 10.11896/jsjkx.191100086

Special Issue: Mobile Crowd Sensing and Computing

• Mobile Crowd Sensing and Computing • Previous Articles     Next Articles

Truth Inference Based on Confidence Interval of Small Samples in Crowdsourcing

ZHANG Guang-yuan, WANG Ning   

  1. School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China
  • Received:2019-11-11 Revised:2020-04-27 Online:2020-10-15 Published:2020-10-16
  • About author:ZHANG Guang-yuan,born in 1994,M.S.,is a member of China Computer Federation.Her main research interests include data quality and crowdsourcing.
    WANG Ning,born in 1967,Ph.D,professor,is a member of China Computer Federation.Her main research interests include web data integration,big data management and crowdsourcing.
  • Supported by:
    National Key R&D Program of China (2018YFC0809800)

Abstract: Crowdsourcing is an increasingly important area of computer applications,because it can address problems that difficult for computer to handle alone.For the openness of crowdsourcing,quality control becomes one of the important challenges.In order to ensure the effectiveness of truth inference,current researches leverage answers of trustful workers to infer truths by evalua-ting worker quality generally.However,most existing methods ignore the long-tail phenomena in crowdsourcing,and there is a lack of researches on the truth inference when the number of tasks completed by workers is generally small.Considering the characteristics of different task types,long-tail phenomenon and worker answers,this paper constructs the confidence interval of small samples to solve truth inference when the number of tasks completed by workers are generally small.Firstly,worker quality is pre-estimated according to the gold standard answer strategy,and different truth initialization methods are adopted according to the result of pre-estimated.Then,the confidence interval of small samples is constructed to evaluate worker quality accurately.Finally,task truths are inferred and worker quality is updated iteratively.In order to verify the effectiveness of the proposed me-thod,5 real datasets are selected to conduct experiments.Compared with the existing methods,the proposed method can solve the problem of the long tail phenomenon effectively,especially the number of tasks completed by each worker is generally small.The average accuracy of the proposed method for the single-choice tasks is as high as 93%,and higher than 16% of the bestperfor-mance of the existing methods.Meanwhile,the values of MAE and RMSE of the proposed method for the numerical tasks are lower than that of the existing methods.

Key words: Crowdsourcing, Long-tail phenomenon, Small sample confidence interval, Truth inference, Worker quality estimation

CLC Number: 

  • TP391.1
[1]ZAIDAN O F,CALLISON-BURCH C.Crowdsourcing translation:Professional quality from non-professionals[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies.Association for Computational Linguistics,2011:1220-1229.
[2]CHU X,MORCOS J,ILYAS I F,et al.Katara:A data cleaning system powered by knowledge bases and crowdsourcing[C]//Proceedings of the 2015 ACM SIGMOD International Confe-rence on Management of Data.ACM,2015:1247-1261.
[3]ZHENG Y,LI G,LI Y,et al.Truth inference in crowdsourcing:Is the problem solved?[J].Proceedings of the VLDB Endowment,2017,10(5):541-552.
[4]SHENG K,GU Z,MAO X,et al.Answer inference forcrowdsourcing based scoring[C]//2014 IEEE Global Communications Conference.IEEE,2014:2733-2738.
[5]ZHI S,YANG F,ZHU Z,et al.Dynamic Truth Discovery on Numerical Data[C]//2018 IEEE International Conference on Data Mining (ICDM).IEEE,2018:817-826.
[6]PARAMESWARAN A G,PARK H,GARCIA-MOLINA H,et al.Deco:declarative crowdsourcing[C]//Proceedings of the 21st ACM International Conference on Information and Knowledge Management.ACM,2012:1203-1212.
[7]DAWID A P,SKENE A M.Maximum likelihood estimation of observer error-rates using the EM algorithm[J].Journal of the Royal Statistical Society:Series C (Applied Statistics),1979,28(1):20-28.
[8]LI Q,LI Y,GAO J,et al.Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation[C]//Proceedings of the 2014 ACM SIGMOD International Confe-rence on Management of Data.ACM,2014:1187-1198.
[9]LI Q,LI Y,GAO J,et al.A confidence-aware approach for truth discovery on long-tail data[J].Proceedings of the VLDB Endowment,2014,8(4):425-436.
[10]XIAO H,GAO J,LI Q,et al.Towards confidence in the truth:A bootstrapping based truth discovery approach[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining.ACM,2016:1935-1944.
[11]HUNG N Q V,TAM N T,TRAN L N,et al.An evaluation of aggregation techniques in crowdsourcing[C]//International Conference on Web Information Systems Engineering.Heidelberg:Springer,2013:1-15.
[12]LI Y,LIU C,DU N,et al.Extracting medical knowledge from crowdsourced question answering website[J].IEEE Transactions on Big Data,2016:1-1.
[13]BROWN L D,CAI T T,DASGUPTA A.Interval estimation for a binomial proportion[J].Statistical Science,2001,16(2):101-117.
[1] FU Yan-ming, ZHU Jie-fu, JIANG Kan, HUANG Bao-hua, MENG Qing-wen, ZHOU Xing. Incentive Mechanism Based on Multi-constrained Worker Selection in Mobile Crowdsourcing [J]. Computer Science, 2022, 49(9): 275-282.
[2] CHEN Dan-hong, PENG Zhang-lin, WAN De-quan, YANG Shan-lin. Identification and Segmentation of User Value in Crowdsourcing Platforms:An Improved RFMModel [J]. Computer Science, 2022, 49(4): 37-42.
[3] SHEN Biao, SHEN Li-wei, LI Yi. Dynamic Task Scheduling Method for Space Crowdsourcing [J]. Computer Science, 2022, 49(2): 231-240.
[4] ZHANG Shao-jie, LU Xu-dong, GUO Wei, WANG Shi-peng, HE Wei. Prevention of Dishonest Behavior in Supply-Demand Matching [J]. Computer Science, 2021, 48(4): 303-308.
[5] ZHAO Yang, NI Zhi-wei, ZHU Xu-hui, LIU Hao, RAN Jia-min. Multi-worker and Multi-task Path Planning Based on Improved Lion Evolutionary Algorithm forSpatial Crowdsourcing Platform [J]. Computer Science, 2021, 48(11A): 30-38.
[6] LI Yu, DUAN Hong-yue, YIN Yu-yu, GAO Hong-hao. Survey of Crowdsourcing Applications in Blockchain Systems [J]. Computer Science, 2021, 48(11): 12-27.
[7] YU Dun-hui, CHENG Tao, YUAN Xu. Software Crowdsourcing Task Recommendation Algorithm Based on Learning to Rank [J]. Computer Science, 2020, 47(12): 106-113.
[8] WANG Kuo, WANG Zhong-jie. Crowdsourcing Collaboration Process Recovery Method [J]. Computer Science, 2020, 47(10): 19-25.
[9] HU Ying, WANG Ying-jie, TONG Xiang-rong. Task Recommendation Model Based on Crowd Worker’s Movement Trajectory [J]. Computer Science, 2020, 47(10): 32-40.
[10] LV Jia-gao,LIANG Kui-yang,CAI Wei. Frontier Scientific Keyword Extraction Based on Bibliometric and Crowdsourcing [J]. Computer Science, 2019, 46(3): 275-282.
[11] HOU Yu-chen, WU Wei. Design and Implementation of Crowdsourcing System for Still Image Activity Annotation [J]. Computer Science, 2019, 46(11A): 580-583.
[12] ZHOU A-peng, QIN Xi-zhong, JIA Zhen-hong and NIKOLA Kasabov. Crowdsourcing-based Indoor Localization via Embedded Manifold Matching [J]. Computer Science, 2017, 44(8): 64-70.
[13] CHEN Hai-yan and XU Zheng. Crowdsourcing Based Description of Urban Emergency Events [J]. Computer Science, 2016, 43(5): 209-213.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!