计算机科学 ›› 2023, Vol. 50 ›› Issue (10): 282-290.doi: 10.11896/jsjkx.221000133

• 计算机网络 • 上一篇    下一篇

众包中基于CIDA和PI-Cosine的双向质量控制策略

刘庆菊, 潘庆先, 童向荣, 于嵩, 潘亚楠   

  1. 烟台大学计算机与控制工程学院 山东 烟台264005
  • 收稿日期:2022-10-17 修回日期:2023-03-21 出版日期:2023-10-10 发布日期:2023-10-10
  • 通讯作者: 潘庆先(pqx@ytu.edu.cn)
  • 作者简介:(liuqingju9763@163.com)
  • 基金资助:
    国家自然科学基金(60903098,61502140,61572418,61472095,62072392);黑龙江自然科学基金(LH2020F023);山东省本科教学改革研究重点项目(Z2022327)

Bidirectional Quality Control Strategies Based on CIDA and PI-cosine in Crowdsourcing

LIU Qingju, PAN Qingxian, TONG Xiangrong, YU Song, PAN Yanan   

  1. School of Computer and Control Engineering,Yantai University,Yantai,Shandong 264005,China
  • Received:2022-10-17 Revised:2023-03-21 Online:2023-10-10 Published:2023-10-10
  • About author:LIU Qingju,born in 1997,postgra-duate,is a member of China Computer Federation.Her main research interest is mobile crowdsourcing.PAN Qingxian,born in 1979,Ph.D candidate,associate professor,is a member of China Computer Federation.His main research interests include artificial intelligence and machine learning.
  • Supported by:
    National Natural Science Foundation of China(60903098,61502140,61572418,61472095,62072392),Natural Science Foundation of Heilongjiang,China(LH2020F023) and KeyResearch Project of Undergraduate Teaching Reform in Shandong Province(Z2022327).

摘要: 随着移动智能终端的普及,众包采集大规模感知数据变得越来越容易。众包工人的自私性使得他们想通过最少的努力获得最多的报酬,甚至互相勾结、随意提交众包数据,导致众包任务完成质量不高。文中提出了一种基于陪审团的质量控制策略,该机制解决了数据验证问题。针对降低众包质量的行为,在判断是否存在垃圾邮件员工和共谋组织后,使用社区影响力检测算法(CIDA)来检测出共谋团伙领导者及其所在组织,最后使用改进的相似性检测算法(PI-Cosine)筛查垃圾邮件员工。从这两个方面来提高众包数据质量。实验结果表明,所提方法在accuracy和F1-score衡量指标上相比Cosine相似度检测算法提高了12.3%。

关键词: 众包, 质量控制, CIDA算法, PI-Cosine相似性检测, 垃圾邮件

Abstract: With the popularity of mobile smart terminals,crowdsourcing to collect large-scale perceptual data becomes easier and easier.The selfishness of crowdworkers makes them want to get the most pay with the least effort,and even collude with each other and submit crowdsourced data arbitrarily,resulting in poor quality of crowdsourced task completion.This paper proposes a jury-based quality control strategy,a mechanism that solves the data validation problem.To address the behaviors that degrade the quality of crowdsourcing,this paper uses the proposed community influence detection algorithm(CIDA) to detect conspiracy leaders and their organizations after determining the presence of spam employees and conspiracy organizations,and finally uses an improved similarity detection algorithm(PI-Cosine) to screen out for spam employees.These two aspects are used to improve the quality of crowdsourcing data.Experiments show that the proposed method improves the accuracy of 12.3% over Cosine similarity detection algorithm in accuracy and F1-score measures.

Key words: Crowdsourcing, Quality control, CIDA algorithm, PI-Cosine similarity detection, Spam

中图分类号: 

  • TP391
[1]HOWE J.The rise of crowdsourcing[J].Wired Magazine,2006,14(6):1-4.
[2]KOROVINA O,BAEZ M,CASATI F.Reliability of crowd-sourcing as a method for collecting emotions labels on pictures[J].BMC Research Notes,2019,12(1):715-715.
[3]ZHANG C,ZHU L,XU C,et al.A privacy-preserving trafficmonitoring scheme via vehicular crowdsourcing[J].Sensors(Basel),2019,19(6):1274.
[4]AHMED M,KARAGIORGOU S,PFOSER D,et al.A comparison and evaluation of map construction algorithms using vehicle tracking data[J].GeoInformatica,2015,19(3):601-632.
[5]CIRQUEIRA D,VINíCIUS L,PINHEIRO M,et al.OpinionLabel:A Gamified Crowdsourcing System for Sentiment[C]//Anais Estendidos do XXIII Simpósio Brasileiro de Sistemas Multimídia e Web.SBC,2017:209-213.
[6]HAGERER G,SZABO D,KOCH A,et al.End-to-End Annotator Bias Approximation on Crowdsourced Single-Label Sentiment Analysis[C]//Proceedings of The Fourth International Conference on Natural Language and Speech Processing(ICNLSP 2021).2021:1-10.
[7]ENNAJI F Z,FAZZIKI A E,ABDALLAOUI H,et al.ACrowdsourcing Based Framework for Sentiment Analysis:A Product Reputation[J].Journal of Communications Software and Systems,2020,16(4):285-295.
[8]ZHOU J,JIN X,YU L,et al.TruthTrust:Truth Inference-Based Trust Management Mechanism on a Crowdsourcing Platform[J].Sensors(Basel,Switzerland),2021,21(8):2578.
[9]KHUDABUKHSH A,CARBONELL J,JANSEN P.DetectingNon-Adversarial Collusion in Crowdsourcing[C]//Proceedings of the AAAI Conference on Human Computation and Crowdsourcing.2014,2:104-111.
[10]CHEN P P,SUN H L,FANG Y L,et al.Collusion-Proof Result Inference in Crowdsourcing[J].Journal of Computer Science & Technology,2018,33(2):351-365.
[11]NIAZI T M,AMINTOOSI H.Collusion-resistant worker selection in social crowdsensing systems[J].Computer and Know-ledge Engineering,2018,1(1):9-20.
[12]AKKERHUIS T S,DE MAST J.Quantifying the random component of measurement error of nominal measurements without a gold standard[J].Quality and Reliability Engineering International,2016,32(6):1993-2003.
[13]JEONG S,LEE K.Spam Classification Based on Signed Net-work Analysis[J].Applied Sciences,2020,10(24):8952.
[14]MADHAVAN V M,PANDE S,UMEKAR P,et al.Comparative Analysis of Detection of Email Spam With the Aid of Machine Learning Approaches[J].IOP Conference Series:Mate-rials Science and Engineering,2021,1022(1):012113.
[15]XU C,SHEN X,ZHU L,et al.A Collusion-Resistant and Privacy-Preserving Data Aggregation Protocol in Crowdsensing System[J].Mobile Information Systems,2017,2017:1-11.
[16]LI M,WENG J,YANG A,et al.CrowdBC:A blockchain-baseddecentralized framework for crowdsourcing[J].IEEE Transactions on Parallel and Distributed Systems,2018,30(6):1251-1266.
[17]WANG Z,HU R,CHEN Q,et al.ColluEagle:collusive reviewspammer detection using Markov random fields[J].Data Mining and Knowledge Discovery,2020,34(6):1621-1641.
[18]KUANG L,ZHANG H,SHI R,et al.A spam worker detection approach based on heterogeneous network embedding in crowdsourcing platforms[J].Computer Networks,2020,183:107587.
[19]LUO J,SHAN H,ZHANG G,et al.Exploiting Syntactic and Semantic Information for Textual Similarity Estimation[J].Mathematical Problems in Engineering,2021,2021:4186750.1-4186750.12.
[20]LAURIOLA I,LAVELLI A,AIOLLI F.An introduction todeep learning in natural language processing:models,techniques,and tools[J].Neurocomputing,2022,470:443-456.
[21]OTT M,CARDIE C,HANCOCK J T.Negative deceptive opi-nion spam[C]//Proceedings of the 2013 Conference of the North Smerican Chapter of the Association for Computational Linguistics:Human Language Technologies.2013:497-501.
[22]MUKHERJEE A,VENKATARAMAN V,LIU B,et al.Fake review detection:Classification and analysis of real and pseudo reviews:UIC-CS-03-2013[R].2013.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!