生成扩散模型研究综述

doi:10.11896/jsjkx.230300057

摘要/Abstract

摘要： 扩散模型在生成模型领域具有高质量的样本生成能力,一经推出就不断地刷新图像生成评价指标FID分数的记录,成为了该领域的研究热点,而此类相关综述在国内还鲜有介绍。因此,文中对相关扩散生成模型的研究进行汇总与分析。首先,对去噪扩散概率模型、基于分数的扩散生成模型和随机微分方程的扩散生成模型这3类通用模型的特点和原理进行了论述,就每一类基本扩散模型中以优化模型内部算法、高效采样为改进目标的相关衍生模型进行分析。其次,对当下扩散模型在计算机视觉、自然语言处理、时间序列、多模态和跨学科领域等方面的应用进行总结。最后,基于上述论述,分别就目前扩散生成模型存在的采样步骤多、采样时间长等局限性提出了相关建议,并结合前述研究对未来扩散生成模型的发展方向进行了研判。

关键词: 深度学习, 生成模型, 去噪扩散概率模型, 基于分数的扩散模型, 随机微分方程, 图像生成

Abstract: Diffusion models have shown high-quality sample generation ability in the field of generative models,and constantly set new records for image generation evaluation indicator FID scores since their introduction,and has become a research hotspot in this field.However,related reviews of this kind are scarce in China.Therefore,this paper aims to summarize and analyze the research on related diffusion generative models.Firstly,it analyzes the related derivative models in each basic diffusion model,which focus on optimizing internal algorithms and efficient sampling,by discussing the characteristics and principles of three common models:denoising diffusion probabilistic model,score-based diffusion generative model,and diffusion generative model based on random differential equations.Secondly,it summarizes the current applications of diffusion models in computer vision,natural language processing,time series,multimodal,and interdisciplinary fields.Finally,based on the above discussion,relevant suggestions for the existing limitations of diffusion generative models are proposed,such as long sampling times and multiple sampling steps,and a research direction for the future development of diffusion generative models is provided based on previous studies.

Key words: Deep learning, Generative models, Denoising diffusion probabilistic models, Score-based diffusion models, Stochastic differential equations, Image generation

中图分类号:

TP183

闫志浩, 周长兵, 李小翠. 生成扩散模型研究综述[J]. 计算机科学, 2024, 51(1): 273-283. https://doi.org/10.11896/jsjkx.230300057

YAN Zhihao, ZHOU Zhangbing, LI Xiaocui. Survey on Generative Diffusion Model[J]. Computer Science, 2024, 51(1): 273-283. https://doi.org/10.11896/jsjkx.230300057

参考文献

[1]SMOLENSKY P.Information processing in dynamical systems:Foundations of harmony theory[R].Colorado Univ. at Boulder Dept. of Computer Science,1986.
[2]HINTON G E,OSINDERO S,TEH Y W.A Fast Learning Algorithm for Deep Belief Nets[J].Neural Computation,2006,18(7):1527-1554.
[3]HINTON G E,SALAKHUTDINOV R R.A Better Way to Pretrain Deep Boltzmann Machines[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems.2012:2447-2455.
[4]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial networks[J].Communications of the ACM,2020,63(11):139-144
[5]KINGMA D P,WELLING M.Auto-Encoding Variational Bayes[J].arXiv:1312.6114,2022.
[6]ZHANG M,SUN Y,MCDONAGH S,et al.Flow Based Models For Manifold Data[J].arXiv:2109.14216,2023.
[7]REZENDE D,MOHAMED S.Variational Inference with Nor-malizing Flows[C]//Proceedings of the 32nd International Conference on Machine Learning.2015:1530-1538.
[8]LECUN Y,CHOPRA S,HADSELL R,et al.A Tutorial onEnergy-Based Learning[M]//Predicting Structured Data.2006.
[9]VAN DEN OORD A,VINYALS O,KALCHBRENNER N,et al.Conditional Image Generation with PixelCNN Decoders[C]//Advances in Neural Information Processing Systems.2016:4797-4805.
[10]HO J,JAIN A,ABBEEL P.Denoising Diffusion ProbabilisticModels[C]//Advances in Neural Information Processing Systems.2020:6840-6851.
[11]CHENG S I,CHEN Y J,CHIU W C,et al.Adaptively-Realistic Image Generation From Stroke and Sketch With Diffusion Model[C]//2023 IEEE/CVF Winter Conference on Applications of Computer Vision(WACV).2023:4043-4051.
[12]JOLICOEUR-MARTINEAU A,PICHÉ-TAILLEFER R,COM-BES R T DES,et al.Adversarialscore matching and improved sampling for image generation[J].arXiv:2009.05475,2020.
[13]CHEN T,ZHANG R,HINTON G.Analog Bits:GeneratingDiscrete Data using Diffusion Models with Self-Conditioning[J].arXiv 2208.04202,2022.
[14]GU Z,CHEN H,XU Z,et al.DiffusionInst:Diffusion Model for Instance Segmentation[J].arXiv:2212.02773,2022.
[15]XU J,WANG X,CHENG W,et al.Dream3D:Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models[J].arXiv:2212.14704,2022.
[16]YE M,WU L,LIU Q.First Hitting Diffusion Models for Gene-rating Manifold,Graph and Categorical Data[J].arXiv:2209.01170,2022.
[17]FURUSAWA C,KITAOKA S,LI M,et al.Generative Probabilistic Image Colorization[J].arXiv:2109.14518,2021.
[18]SAHARIA C,HO J,CHAN W,et al.Image Super-Resolution Via Iterative Refinement[J].IEEE Transactions on PatternAnalysis and Machine Intelligence,2022,45(4):4713-4726.
[19]ESSER P,ROMBACH R,BLATTMANN A,et al.Image-BART:Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis[C]//Advances in Neural Information Processing Systems.2021:3518-3532.
[20]BATZOLIS G,STANCZUK J,SCHÖNLIEB C B,et al.Non-Uniform Diffusion Models[J].arXiv:2207.09786,2022.
[21]LEE S,CHUNG H,KIM J,et al.Progressive Deblurring of Diffusion Models for Coarse-to-Fine Image Synthesis[J].arXiv:2207.11192,2022.
[22]GAO Z,GUO J,TAN X,et al.Difformer:Empowering Diffusion Models on the Embedding Space for Text Generation[J].arXiv:2212.09412,2023.
[23]GONG S,LI M,FENG J,et al.DiffuSeq:Sequence to Sequence Text Generation with Diffusion Models[J].arXiv:2210.08933,2022.
[24]REID M,HELLENDOORN V J,NEUBIG G.DiffusER:Dis-crete Diffusion via Edit-based Reconstruction[J].arXiv:2210.16886,2022.
[25]LI X L,THICKSTUN J,GULRAJANI I,et al.Diffusion-LMImproves Controllable Text Generation[J].arXiv:2205.14217,2022.
[26]HE Z,SUN T,WANG K,et al.DiffusionBERT:Improving Ge-nerative Masked Language Models with Diffusion Models[J].arXiv:2211.15029,2022.
[27]LIN Z,GONG Y,SHEN Y,et al.GENIE:Large Scale Pre-trai-ning for Text Generation with Diffusion Model[J].arXiv:2212.11685,2022.
[28]TASHIRO Y,SONG J,SONG Y,et al.CSDI:Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation[C]//Advances in Neural Information Processing Systems.2021:24804-24816.
[29]ALCARAZ J M L,STRODTHOFF N.Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models[J].arXiv:2208.09399,2022.
[30]RASUL K,SEWARD C,SCHUSTER I,et al.AutoregressiveDenoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting[C]//Proceedings of the 38th International Conference on Machine Learning.2021:8857-8868.
[31]GU S,CHEN D,BAO J,et al.Vector Quantized Diffusion Model for Text-to-Image Synthesis[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2022:10686-10696.
[32]RAMESH A,DHARIWAL P,NICHOL A,et al.HierarchicalText-Conditional Image Generation with CLIP Latents[J].ar-Xiv:2204.06125,2022.
[33]SAHARIA C,CHAN W,SAXENA S,et al.Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding[J].arXiv:2205.11487,2022.
[34]NICHOL A,JUN H,DHARIWAL P,et al.Point-E:A System for Generating 3D Point Clouds from Complex Prompts[J].ar-Xiv:2212.08751,2022.
[35]POPOV V,VOVK I,GOGORYAN V,et al.Grad-TTS:A Diffusion Probabilistic Model for Text-to-Speech[C]//Proceedings of the 38th International Conference on Machine Learning.2021:8599-8608.
[36]JEONG M,KIM H,KIM H,et al.Diff-TTS:A Denoising Diffusion Model for Text-to-Speech[J].arXiv:2014.01409,2021.
[37]CHEN Z,WU Y,LENG Y,et al.ResGrad:Residual Denoising Diffusion Probabilistic Models for Text to Speech[J].arXiv:2212.14518,2022.
[38]KIM B,YE J C.Diffusion Deformable Model for 4D Temporal Medical Image Generation[J].arXiv:2206.13295,2022.
[39]WOLLEB J,BIEDER F,SANDKÜHLER R,et al.DiffusionModels for Medical Anomaly Detection[J].arXiv:2203.04306,2022.
[40]SANCHEZ P,KASCENAS A,LIU X,el al.What is Healthy? Generative Counterfactual Diffusion for Lesion Localization[C]//Deep Generative Models.Cham:Springer Nature Switzerland.2022:34-44.
[41]WYATT J,LEACH A,SCHMON S M,et al.AnoDDPM:Anomaly Detection with Denoising Diffusion Probabilistic Mo-dels using Simplex Noise[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPRW).2022:649-655.
[42]SONG Y,SHEN L,XING L,et al.Solving Inverse Problems in Medical Imaging with Score-Based Generative Models[J].ar-Xiv:2111.08005,2021.
[43]CHUNG H,YE J C.Score-based diffusion models for accele-rated MRI[J].arXiv:2110.05243,2022.
[44]WU K E,YANG K K,BERG R VAN DEN,et al.Protein structure generation via folding diffusion[J].arXiv:2209.15611,2022.
[45]LEE J S,KIM P M.ProteinSGM:Score-based generative mode-ling for de novo protein design[J].Nature Computational Science,2023,3(5):382-392.
[46]ANAND N,ACHIM T.Protein Structure and Sequence Generation with Equivariant Denoising Diffusion Probabilistic Models[J].arXiv:2205.15019,2022.
[47]CROITORU F A,HONDRU V,IONESCU R T,et al.Diffusion Models in Vision:A Survey[J].arXiv:2209.04747,2022.
[48]YANG L,ZHANG Z,SONG Y,et al.Diffusion Models:A Comprehensive Survey ofMethods and Applications[J].arXiv:2209.00796,2022.
[49]CAO H,TAN C,GAO Z,et al.A Survey on Generative Diffusion Model[J].arXiv:2209.02646,2022.
[50]LAM M W Y,LAM M W Y,WANG J,et al.Bilateral Denoising Diffusion Models.[J].arXiv:2108.11514,2021.
[51]GIANNONE G,NIELSEN D,WINTHER O.Few-Shot Diffusion Models[J].arXiv:2205.15463,2022.
[52]SONG Y,ERMON S.Improved Techniques for Training Score-Based Generative Models[C]//Advances in Neural Information Processing Systems.2020:12438-12448.
[53]SONG Y,DURKAN C,MURRAY I,et al.Maximum Likelihood Training of Score-Based Diffusion Models[C]//Advances in Neural Information Processing Systems.2021:1415-1428.
[54]SONG Y,ERMON S.Generative Modeling by Estimating Gra-dients ofthe Data Distribution[C]//Advances in Neural Information Processing Systems.2019.
[55]SONG Y,SOHL-DICKSTEIN J,KINGMA D P,et al.Score-Based Generative Modeling through Stochastic Differential Equations[J].arXiv:2011.13456,2021.
[56]WILSON J T,MORICONI R,HUTTER F,et al.The reparameterization trick for acquisition functions[J].arXiv:1712.00424,2017.
[57]SOHL-DICKSTEIN J,WEISS E,MAHESWARANATHANN,et al.Deep Unsupervised Learning using Nonequilibrium Thermodynamics[C]//Proceedings of the 32nd International Conference on Machine Learning.PMLR.2015:2256-2265.
[58]RONNEBERGER O,FISCHER P,BROX T.U-Net:Convolu-tional Networks for Biomedical Image Segmentation[J].arXiv:1505.04597,2015.
[59]WELLING M,TEH Y W.Bayesian Learning via StochasticGradient Langevin Dynamics[C]//Proceedings of the 28th International Conference on Machine Learning(ICML-11).2011:681-688.
[60]VINCENT P.A Connection Between Score Matching and De-noising Autoencoders[J].Neural Computation,2011,23(7):1661-1674.
[61]SONG Y,GARG S,SHI J,et al.Sliced Score Matching:A Scalable Approach to Density and Score Estimation[C]//Proceedings of The 35th Uncertainty in Artificial Intelligence Conference.2020:574-584.
[62]NICHOL A Q,DHARIWAL P.Improved Denoising Diffusion Probabilistic Models[C]//Proceedings of the 38th International Conference on Machine Learning.2021:8162-8171.
[63]KINGMA D,SALIMANS T,POOLE B,et al.Variational Diffusion Models[C]//Advances in Neural Information Processing Systems.2021:21696-21707.
[64]SAN-ROMAN R,NACHMANI E,WOLF L.Noise Estimation for Generative Diffusion Models[J].arXiv:2104.02600,2021.
[65]WANG J,LYU Z,LIN D,et al.Guided Diffusion Model for Adversarial Purification[J].arXiv:2205.14969,2022.
[66]SONG J,MENG C,ERMON S.Denoising Diffusion ImplicitModels[J].arXiv:2010.02502,2020.
[67]ZHANG Q,TAO M,CHEN Y.gDDIM:Generalized denoising diffusion implicit models[J].arXiv:2206.05564,2022.
[68]SINHA A,SONG J,MENG C,et al.D2C:Diffusion-Denoising Models for Few-shot Conditional Generation.[J].arXiv:2106.06819,2021.
[69]PEEBLES W,XIE S.Scalable Diffusion Models with Trans-formers[J].arXiv:2212.09748,2022.
[70]SEHWAG V,HAZIRBAS C,GORDO A,et al.Generating High Fidelity Data From Low-Density Regions Using Diffusion Mo-dels[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2022:11492-11501.
[71]KIM G,JANG W,LEE G,et al.DAG:Depth-Aware Guidance with Denoising Diffusion Probabilistic Models[J].arXiv:2212.08861,2023.
[72]AUSTIN J,JOHNSON D D,HO J,et al.Structured Denoising Diffusion Models in Discrete State-Spaces[C]//Advances in Neural Information Processing Systems.2021:17981-17993.
[73]WATSON D,CHAN W,HO J,et al.Learning Fast Samplers for Diffusion Models by Differentiating Through Sample Quality[J].arXiv:2202.05830,2022.
[74]WATSON D,HO J,NOROUZI M,et al.Learning to efficiently sample from diffusion probabilistic models[J].arXiv:2106.03802,2021.
[75]XIAO Z,KREIS K,VAHDAT A.Tackling the generative lear-ning trilemma with denoising diffusion GANs[J].arXiv:2112.07804,2021.
[76]BAO F,LI C,SUN J,et al.Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models[J].ar-Xiv:2212.08861,2022.
[77]CHUNG H,SIM B,YE J C.Come-Closer-Diffuse-Faster:Acce-lerating Conditional Diffusion Models for Inverse Problems Through Stochastic Contraction[C]//2022 IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition(CVPR).2022:12413-12422.
[78]VAHDAT A,KREIS K,KAUTZ J.Score-based GenerativeModeling in Latent Space[C]//Advances in Neural Information Processing Systems.2021:11287-11302.
[79]ZHANG L,ZHU X,FENG J.Accelerating Score-based Generative Models with Preconditioned Diffusion Sampling[J].arXiv:2207.02196,2022.
[80]DU W,YANG T,ZHANG H,et al.A Flexible Diffusion Model[J].arXiv:2206.10365,2022.
[81]HO J,SALIMANS T.Classifier-Free Diffusion Guidance[J].arXiv:2207.12598,2022.
[82]ZHANG Q,CHEN Y.Diffusion Normalizing Flow[J].arXiv:2110.07579,2021.
[83]KIM D,NA B,KWON S J,et al.Maximum Likelihood Training of Implicit Nonlinear Diffusion Models[J].arXiv:2205.13699,2022.
[84]DOCKHORN T,VAHDAT A,KREIS K.Score-Based Generative Modeling with Critically-Damped Langevin Diffusion[J].arXiv:2112.07068,2022.
[85]BORTOLI V D,THORNTON J,HENG J,et al.DiffusionSchrödinger Bridge with Applications to Score-Based Generative Modeling[J].arXiv:2106.01357,2021.
[86]LIU L,REN Y,LIN Z,et al.Pseudo Numerical Methods for Dif-fusion Models on Manifolds[J].arXiv:2202.09778,2022.
[87]JOLICOEUR-MARTINEAU A,LI K.Gotta Go Fast WhenGenerating Data with Score-Based Models[J].arXiv:2105.14080,2021.
[88]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll you Need[C]//Advances in Neural Information Processing Systems.2017.
[89]WANG Z,ZHENG H,HE P,et al.Diffusion-GAN:TrainingGANs with Diffusion[J].arXiv:2206.02262,2022.
[90]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.AnImage is Worth 16x16 Words:Transformers for Image Recognition at Scale[J].arXiv:2010.11929,2021.
[91]DEASY J,SIMIDJIEVSKI N,LIÒ P.Heavy-tailed denoisingscore matching[J].arXiv:2112.09788,2022.
[92]KARRAS T,AITTALA M,AILA T,et al.Elucidating the Design Space of Diffusion-Based Generative Models[J].arXiv:2206.00364,2022.
[93]LI H,YIFAN Y,CHANG M,et al.SRDiff:Single Image Super-Resolution with Diffusion Probabilistic Models[J].arXiv:2104.14951,2021.
[94]HO J,SAHARIA C,CHAN W,et al.Cascaded Diffusion Models for High Fidelity Image Generation[J].arXiv:2106.15282,2021.
[95]SASAKI H,WILLCOCKS C G,BRECKON T P.UNIT-DDPM:UNpaired Image Translation with Denoising Diffusion Probabilistic Models[J].arXiv:2104.05358,2021.
[96]WANG W,BAO J,ZHOU W,et al.Semantic Image Synthesis via Diffusion Models[J].arXiv:2207.00050,2022.
[97]LEWIS M,LIU Y,GOYAL N,et al.BART:Denoising Se-quence-to-Sequence Pre-training for Natural Language Generation,Translation,and Comprehension[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:7871-7880.
[98]ZHOU L Q,DU Y L,WU J J,et al.3D Shape Generation and Completion through Point-Voxel Diffusion[J].arXiv::2104.03670,2021.
[99]LUO S T,HU W.Diffusion Probabilistic Models for 3D Point Cloud Generation[J].arXiv:2103.01458,2021.
[100]LEE J,IM W,LEE S,et al.Diffusion Probabilistic Models for Scene-Scale 3D Categorical Data[J].arXiv:2301.00527,2023.
[101]DEVLIN J,CHANG MW,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Minneapolis,Minnesota:Association for Computational Linguistics,2019:4171-4186.
[102]GU A,GOEL K,RÉ C.Efficiently Modeling Long Sequences with Structured State Spaces[J].arXiv:2111.00396,2022.
[103]SCHMIDT R M.Recurrent Neural Networks(RNNs):A gentle Introduction and Overview[J].arXiv:1912.05911,2019.
[104]RADFORD A,KIM J W,HALLACY C,et al.Learning Transferable Visual Models From Natural Language Supervision[J].arXiv:2103.00020,2021.
[105]MOLAD E,HORWITZ E,VALEVSKI D,et al.Dreamix:Vi-deo Diffusion Models are General Video Editors [J].arXiv:2302.01329,2023.
[106]WU J Z,GE Y,WANG X,et al.Tune-A-Video:One-Shot Tu-ning of Image Diffusion Models for Text-to-Video Generation [J].arXiv:2212.11565,2022.
[107]FRANZESE G,ROSSI S,YANG L,et al.How Much isEnough? A Study onDiffusion Times in Score-based Generative Models[J].arXiv:2206.05173,2022.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed