计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 250200123-7.doi: 10.11896/jsjkx.250200123

• 信息安全 • 上一篇    

基于对抗生成网络的众包内容隐私保护

黄晓宇1,2, 姜贺萌1, 凌嘉铭1   

  1. 1 华南理工大学电子商务系 广州 510006
    2 人工智能与数字经济广东省实验室(广州) 广州 510335
  • 出版日期:2025-06-16 发布日期:2025-06-12
  • 通讯作者: 黄晓宇(echxy@scut.edu.cn)
  • 基金资助:
    广东省哲学社会科学规划项目(GD21CGL02);人工智能与数字经济广东省实验室(广州)青年学者项目(PZL2021KF0027)

Privacy Preservation of Crowdsourcing Content Based on Adversarial Generative Networks

HUANG Xiaoyu1,2, JIANG Hemeng1, LING Jiaming1   

  1. 1 Department of Electronic Business,South China University of Technology,Guangzhou 510006,China
    2 Guangdong Artificial Intelligence and Digital Economy Laboratory(Guangzhou),Guangzhou 510335,China
  • Online:2025-06-16 Published:2025-06-12
  • About author:HUANG Xiaoyu,born in 1977,Ph.D,associate professor.His main research interests include machine learning,statistics,and optimization theory with applications.
  • Supported by:
    Guangdong Philosophy and Social Science Planning Project(GD21CGL02) and Young Scholars Program of Guangdong Laboratory of Artificial Intelligence and Digital Economy(Guangzhou)(PZL2021KF0027).

摘要: 众包作为一种新兴的任务外包模式,被广泛认为是针对面向海量数据的标注、分析等工作需求的高效且经济的解决方案。但对众包任务的持有者(即任务主)而言,在众包机制下,众包工人可以不受限制地访问其私有的数据,这个过程蕴含了巨大的隐私泄漏风险。针对此问题,提出了能保证内容隐私安全的众包模型PrivCS。PrivCS的核心设计理念是使用对抗生成网络(GAN)生成的“合成数据”替代原始的真实数据面向众包工人公开发布。PrivCS对内容的隐私安全保护能力由GAN的理论性质保证,此外,还证明了PrivCS机制无论在数据标签提取,还是在模型训练等任务中,都能取得与传统的众包机制相近的效用。实验结果也对文中的理论论断提供了支持。

关键词: 众包, 隐私保护, 生成对抗网络

Abstract: Crowdsourcing is an emerging alternative of outsourcing strategy that aims at making use of the wisdom of the crowd.Dueto the cheap and efficient characteristics of crowdsourcing,it’s widely recognized as an ideal solution for massive data oriented processing tasks,such as data labeling and model training.In crowdsourcing,however,on the task owners side,to get benifits from the wisdom of the unforeseen workers,they have to first make their private data unlimited accessed publicly,which is unsafe as the risk of the information leakage is concerned.To address this issue,we propose a crowdsourcing model PrivCS that can ensure content privacy security.The essential idea of PrivCS is to synthetiz some new data with regard to the task owners’ private data and pulicly publish the synthetic data to the workers instead of the real data.The tool we adopt to synthetiz the new data is the adversarial generative networks(GAN).There have been lots of exploitations show that GAN is privacy-preserving,therefore PrivCS of course inherits the same ability from GAN.We also study the theoretic performance of PrivCS,our analysis show that the outputs of PrivCS are comparable with respect to those derived from the real data,in terms of both data labeling and model training tasks.In addition,our experimental results support the theoretic findings.

Key words: Crowdsourcing, Privacy preserving, Generating adversarial networks

中图分类号: 

  • TP181
[1]HOWE J.The rise of crowdsourcing[J].Wired magazine,2006,14(6):176-183.
[2]RUSSAKOVSKY O,DENG J,SU H,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115:211-252.
[3]LIU A,LI Z X,LIU G F,et al.Privacy-preserving task assignment in spatial crowdsourcing[J].Journal of Computer Science and Technology,2017,32(5):905-918.
[4]TO H,GHINITA G,SHAHABI C.A framework for protecting worker location privacy in spatial crowdsourcing[J].Proceedings of the VLDB Endowment,2014,7(10):919-930.
[5]LIN C,HE D,ZEADALLY S,et al.SecBCS:a secure and privacy-preserving blockchain-based crowdsourcing system[J].Science China Information Sciences,2020,63:1-14.
[6]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial networks[J].Communications of the ACM,2020,63(11):139-144.
[7]ALTHUIZEN N,CHEN B.Crowdsourcing ideas using product prototypes:the joint effect of prototype enhancement and the product design goal on idea novelty[J].Management Science,2022,68(4):3008-3025.
[8]KARGER D R,OH S,SHAH D.Budget-optimal task allocation for reliable crowdsourcing systems[J].Operations Research,2014,62(1):1-24.
[9]SHAH N,GUO Y,WENDELSDORF K V,et al.A crowdsourcing approach for reusing and meta-analyzing gene expression data[J].Nature Biotechnology,2016,34(8):803-806.
[10]DWORK C,ROTH A.The algorithmic foundations of differential privacy[J].Foundations and Trends© in Theoretical Computer Science,2014,9(3/4):211-407.
[11]SHALEV-SHWARTZ S,BEN-DAVID S.Understanding ma-chine learning:From theory to algorithms[M].Cambridge University Press,2014.
[12]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[13]CHIB S,GREENBERG E.Understanding the metropolis-has-tings algorithm[J].The American Statistician,1995,49(4):327-335.
[14]GELFAND A E.Gibbs sampling[J].Journal of the American statistical Association,2000,95(452):1300-1304.
[15]CRESWELL A,WHITE T,DUMOULIN V,et al.Generativeadversarial networks:An overview[J].IEEE Signal Processing Magazine,2018,35(1):53-65.
[16]GUI J,SUN Z,WEN Y,et al.A review on generative adversarial networks:Algorithms,theory,and applications[J].IEEE Transactions on Knowledge and Data Engineering,2021,35(4):3313-3332.
[17]CAI Z,XIONG Z,XU H,et al.Generative adversarial networks:A survey toward private and secure applications[J].ACM Computing Surveys(CSUR),2021,54(6):1-38.
[18]GULRAJANI I,AHMED F,ARJOVSKY M,et al.Improvedtraining of wasserstein gans[J].Advances in Neural Information Processing Systems,2017,30.
[19]LIN Z,SEKAR V,FANTI G.On the privacy properties of gan-generated samples[C]//International Conference on Artificial Intelligence and Statistics.PMLR,2021:1522-1530.
[20]XU D,RUAN C,KORPEOGLU E,et al.Rethinking neural vs.matrix-factorization collaborative filtering:the theoretical perspectives[C]//International Conference on Machine Learning.PMLR,2021:11514-11524.
[21]WAINWRIGHT M J.High-dimensional statistics:A non-as-ymptotic viewpoint[M].Cambridge University Press,2019.
[22]BOUCHERON S,LUGOSI G,MASSART P.Concentration Inequalities:A Nonasymptotic Theory of Independence[M].OUP:Oxford,2013.
[23]PASZKE A,GROSS S,MASSA F,et al.Pytorch:An imperative style,high-performance deep learning library[J].Advances in Neural Information Processing Systems,2019,32.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!