计算机科学 ›› 2024, Vol. 51 ›› Issue (3): 30-38.doi: 10.11896/jsjkx.230700177

• 新计算模式下的信息安全防护 • 上一篇    下一篇

基于隐空间扩散模型的差分隐私数据合成方法研究

葛胤池, 张辉, 孙浩航   

  1. 北京航空航天大学复杂关键软件环境全国重点实验室 北京100191
  • 收稿日期:2023-07-24 修回日期:2023-12-08 出版日期:2024-03-15 发布日期:2024-03-13
  • 通讯作者: 张辉(hzhang@buaa.edu.cn)
  • 作者简介:(geyinchi@buaa.edu.cn)

Differential Privacy Data Synthesis Method Based on Latent Diffusion Model

GE Yinchi, ZHANG Hui, SUN Haohang   

  1. State Key Laboratory of Complex &Critical Software Environment,Beihang University,Beijing 100191,China
  • Received:2023-07-24 Revised:2023-12-08 Online:2024-03-15 Published:2024-03-13
  • About author:GE Yinchi,born in 1996,Ph.D,is a student member of CCF(No.P1106G).His main research interests include data security,privacy-preserving computing,and artificial intelligence.ZHANG Hui,born in 1968,professor,Ph.D supervisor.His main research interests include big data management and mining,data security,and block chain

摘要: 数据共享与发布可以有效发挥数据的价值,能够在数智时代推动科技进步和经济社会的发展。在数据共享的同时如何保护数据版权及个人隐私仍是一项巨大的挑战。差分隐私数据合成是数据隐私保护的有效手段,数据持有者通过发布合成数据取代真实数据,一方面可以保护数据隐私,另一方面也可以提高数据的泛用性与可用性。针对差分隐私生成模型合成图像数据样本可用性低的问题,提出了基于隐空间扩散模型的两阶段差分隐私生成模型。首先对原始图像进行差分隐私感知信息压缩,将其从像素空间投射至隐空间中,获得原始敏感数据的脱敏隐向量表示。然后将隐向量输入扩散模型,使其逐渐转变为先验分布,并通过去噪过程进行采样。最后,使用MNIST和Fashion MNIST数据集训练并进行数据合成,结果表明该模型在FID和下游任务准确性上相比DP-Sinkhorn等SOTA模型均有明显提升。

关键词: 差分隐私, 数据合成, 生成模型, 自编码器, 扩散模型

Abstract: The widespread application of data sharing and publication in the socio-economic domain drives scientific progress and societal development.However,issues related to copyright and privacy,especially concerning personal data,remain critical challenges.Differential privacy data synthesis has emerged as an effective means of protecting data privacy,where data holders can release synthetic data instead of real data,thereby enhancing data utility and availability while preserving privacy.In response to the limited usability of existing differential privacy generation models,this paper proposes a two-stage differential privacy generation model based on the latent space diffusion approach.Firstly,the differential privacy-aware information compression is performed on the original image,and it is projected from the pixel space to the latent space to obtain the desensitized latent vector representation of the original sensitive data.The latent vector is then fed into a diffusion model to gradually transform into a prior distribution and sampled through a denoising process.Experimental results based on the MNIST and Fashion MNIST datasets demonstrate that the proposed model exhibits significant improvements in terms of Fréchet inception distance(FID) and downstream task accuracy compared to state-of-the-art models like DP-Sinkhorn.

Key words: Differential privacy, Data synthesis, Generative models, Autoencoder, Diffusion models

中图分类号: 

  • TP183
[1]ARMANIOUS K,JIANG C,FISCHER M,et al.MedGAN:Medical image translation using GANs[J].Computerized Medical Imaging and Graphics,2020,79:101684.
[2]HU H,SALCIC Z,SUNL,et al.Membership inference attacks on machine learning:A survey[J].ACM Computing Surveys(CSUR),2022,54(11s):1-37.
[3]SUN H,ZHU T,ZHANG Z,et al.Adversarial attacks against deep generative models on data:a survey[J].IEEE Transactions on Knowledge and Data Engineering,2021,35(4):3367-3388.
[4]KINGMA D P,WELLING M.Auto-encoding variational bayes[J].arXiv:1312.6114,2013.
[5]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial networks[J].Communications of the ACM,2020,63(11):139-144.
[6]HO J,JAIN A,ABBEEL P.Denoising diffusion probabilisticmodels[J].Advances in Neural Information Processing Systems,2020,33:6840-6851.
[7]ABADI M,CHU A,GOODFELLOW I,et al.Deep learning with differential privacy[C]//Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security.2016:308-318.
[8]DWORK C.Differential privacy[C]//Automata,Languages and Programming:33rd International Colloquium(ICALP 2006).Springer Berlin Heidelberg,2006:1-12.
[9]SWEENEY L.k-anonymity:A model for protecting privacy[J].International Journal of Uncertainty,Fuzziness and Knowledge-based Systems,2002,10(5):557-570.
[10]MIRONOV I.Rényi differential privacy[C]//2017 IEEE 30th Computer Security Foundations Symposium(CSF).IEEE,2017:263-275.
[11]DONG J,ROTH A,SU W J.Gaussian differential privacy[J].Journal of the Royal Statistical Society Series B:Statistical Methodology,2022,84(1):3-37.
[12]GOPI S,LEE Y T,WUTSCHITZ L.Numerical composition of differential privacy[J].Advances in Neural Information Proces-sing Systems,2021,34:11631-11642.
[13]CHEN Q,XIANG C,XUE M,et al.Differentially private data generative models[J].arXiv:1812.02274,2018.
[14]HARDER F,ADAMCZEWSKI K,PARK M.Dp-merf:Differen-tially private mean embeddings with randomfeatures for practical privacy-preserving data generation[C]//International Conference on Artificial Intelligence and Statistics.PMLR,2021:1819-1827.
[15]HARDER F,JALALI M,SUTHERLAND D J,et al.Pre-trained perceptual features improve differentially private image generation[C]//TMLR.2023.
[16]VINAROZ M,CHARUSAIE M A,HARDER F,et al.Hermite polynomial features for private data generation[C]//International Conference on Machine Learning.PMLR,2022:22300-22324.
[17]CAO T,BIE A,VAHDAT A,et al.Don't generate me:Training differentially private generative models with sinkhorn divergence[J].Advances in Neural Information Processing Systems,2021,34:12480-12492.
[18]XIE L,LIN K,WANG S,et al.Differentially private generative adversarial network[J].arXiv:1802.06739,2018.
[19]TORKZADEHMAHANI R,KAIROUZ P,PATEN B.Dp-cgan:Differentially private synthetic data and label generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2019.
[20]JORDON J,YOON J,VAN DER SCHAAR M.PATE-GAN:Generating synthetic data with differential privacy guarantees[C]//International Conference on Learning Representations.2019.
[21]PAPERNOT N,ABADI M,ERLINGSSON Ú,et al.Semi-su-pervised knowledge transfer for deep learning from private training data [C]//International Conference on Learning Representations.2016.
[22]LONG Y,WANG B,YANG Z,et al.G-PATE:scalable differentially private data generator via private aggregation of teacher discriminators[J].Advances in Neural Information Processing Systems,2021,34:2965-2977.
[23]CHEN D,OREKONDY T,FRITZ M.Gs-wgan:A gradient-sani-tized approach for learning differentially private generators[J].Advances in Neural Information Processing Systems,2020,33:12673-12684.
[24]WANG B,WU F,LONG Y,et al.Datalens:Scalable privacypreserving training via gradient compression and aggregation[C]//Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security.2021:2146-2168.
[25]ROMBACH R,BLATTMANN A,LORENZ D,et al.High-re-solution image synthesis with latent diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:10684-10695.
[26]VAN DEN OORD A,KALCHBRENNER N,ESPEHOLT L,et al.Conditional image generation with pixelcnn decoders[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.2016:4797-4805.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!