计算机科学 ›› 2024, Vol. 51 ›› Issue (3): 30-38.doi: 10.11896/jsjkx.230700177
葛胤池, 张辉, 孙浩航
GE Yinchi, ZHANG Hui, SUN Haohang
摘要: 数据共享与发布可以有效发挥数据的价值,能够在数智时代推动科技进步和经济社会的发展。在数据共享的同时如何保护数据版权及个人隐私仍是一项巨大的挑战。差分隐私数据合成是数据隐私保护的有效手段,数据持有者通过发布合成数据取代真实数据,一方面可以保护数据隐私,另一方面也可以提高数据的泛用性与可用性。针对差分隐私生成模型合成图像数据样本可用性低的问题,提出了基于隐空间扩散模型的两阶段差分隐私生成模型。首先对原始图像进行差分隐私感知信息压缩,将其从像素空间投射至隐空间中,获得原始敏感数据的脱敏隐向量表示。然后将隐向量输入扩散模型,使其逐渐转变为先验分布,并通过去噪过程进行采样。最后,使用MNIST和Fashion MNIST数据集训练并进行数据合成,结果表明该模型在FID和下游任务准确性上相比DP-Sinkhorn等SOTA模型均有明显提升。
中图分类号:
[1]ARMANIOUS K,JIANG C,FISCHER M,et al.MedGAN:Medical image translation using GANs[J].Computerized Medical Imaging and Graphics,2020,79:101684. [2]HU H,SALCIC Z,SUNL,et al.Membership inference attacks on machine learning:A survey[J].ACM Computing Surveys(CSUR),2022,54(11s):1-37. [3]SUN H,ZHU T,ZHANG Z,et al.Adversarial attacks against deep generative models on data:a survey[J].IEEE Transactions on Knowledge and Data Engineering,2021,35(4):3367-3388. [4]KINGMA D P,WELLING M.Auto-encoding variational bayes[J].arXiv:1312.6114,2013. [5]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial networks[J].Communications of the ACM,2020,63(11):139-144. [6]HO J,JAIN A,ABBEEL P.Denoising diffusion probabilisticmodels[J].Advances in Neural Information Processing Systems,2020,33:6840-6851. [7]ABADI M,CHU A,GOODFELLOW I,et al.Deep learning with differential privacy[C]//Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security.2016:308-318. [8]DWORK C.Differential privacy[C]//Automata,Languages and Programming:33rd International Colloquium(ICALP 2006).Springer Berlin Heidelberg,2006:1-12. [9]SWEENEY L.k-anonymity:A model for protecting privacy[J].International Journal of Uncertainty,Fuzziness and Knowledge-based Systems,2002,10(5):557-570. [10]MIRONOV I.Rényi differential privacy[C]//2017 IEEE 30th Computer Security Foundations Symposium(CSF).IEEE,2017:263-275. [11]DONG J,ROTH A,SU W J.Gaussian differential privacy[J].Journal of the Royal Statistical Society Series B:Statistical Methodology,2022,84(1):3-37. [12]GOPI S,LEE Y T,WUTSCHITZ L.Numerical composition of differential privacy[J].Advances in Neural Information Proces-sing Systems,2021,34:11631-11642. [13]CHEN Q,XIANG C,XUE M,et al.Differentially private data generative models[J].arXiv:1812.02274,2018. [14]HARDER F,ADAMCZEWSKI K,PARK M.Dp-merf:Differen-tially private mean embeddings with randomfeatures for practical privacy-preserving data generation[C]//International Conference on Artificial Intelligence and Statistics.PMLR,2021:1819-1827. [15]HARDER F,JALALI M,SUTHERLAND D J,et al.Pre-trained perceptual features improve differentially private image generation[C]//TMLR.2023. [16]VINAROZ M,CHARUSAIE M A,HARDER F,et al.Hermite polynomial features for private data generation[C]//International Conference on Machine Learning.PMLR,2022:22300-22324. [17]CAO T,BIE A,VAHDAT A,et al.Don't generate me:Training differentially private generative models with sinkhorn divergence[J].Advances in Neural Information Processing Systems,2021,34:12480-12492. [18]XIE L,LIN K,WANG S,et al.Differentially private generative adversarial network[J].arXiv:1802.06739,2018. [19]TORKZADEHMAHANI R,KAIROUZ P,PATEN B.Dp-cgan:Differentially private synthetic data and label generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2019. [20]JORDON J,YOON J,VAN DER SCHAAR M.PATE-GAN:Generating synthetic data with differential privacy guarantees[C]//International Conference on Learning Representations.2019. [21]PAPERNOT N,ABADI M,ERLINGSSON Ú,et al.Semi-su-pervised knowledge transfer for deep learning from private training data [C]//International Conference on Learning Representations.2016. [22]LONG Y,WANG B,YANG Z,et al.G-PATE:scalable differentially private data generator via private aggregation of teacher discriminators[J].Advances in Neural Information Processing Systems,2021,34:2965-2977. [23]CHEN D,OREKONDY T,FRITZ M.Gs-wgan:A gradient-sani-tized approach for learning differentially private generators[J].Advances in Neural Information Processing Systems,2020,33:12673-12684. [24]WANG B,WU F,LONG Y,et al.Datalens:Scalable privacypreserving training via gradient compression and aggregation[C]//Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security.2021:2146-2168. [25]ROMBACH R,BLATTMANN A,LORENZ D,et al.High-re-solution image synthesis with latent diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:10684-10695. [26]VAN DEN OORD A,KALCHBRENNER N,ESPEHOLT L,et al.Conditional image generation with pixelcnn decoders[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.2016:4797-4805. |
|