生成扩散模型研究综述

doi:10.11896/jsjkx.230300057

Abstract

Abstract: Diffusion models have shown high-quality sample generation ability in the field of generative models,and constantly set new records for image generation evaluation indicator FID scores since their introduction,and has become a research hotspot in this field.However,related reviews of this kind are scarce in China.Therefore,this paper aims to summarize and analyze the research on related diffusion generative models.Firstly,it analyzes the related derivative models in each basic diffusion model,which focus on optimizing internal algorithms and efficient sampling,by discussing the characteristics and principles of three common models:denoising diffusion probabilistic model,score-based diffusion generative model,and diffusion generative model based on random differential equations.Secondly,it summarizes the current applications of diffusion models in computer vision,natural language processing,time series,multimodal,and interdisciplinary fields.Finally,based on the above discussion,relevant suggestions for the existing limitations of diffusion generative models are proposed,such as long sampling times and multiple sampling steps,and a research direction for the future development of diffusion generative models is provided based on previous studies.

Key words: Deep learning, Generative models, Denoising diffusion probabilistic models, Score-based diffusion models, Stochastic differential equations, Image generation

CLC Number:

TP183

YAN Zhihao, ZHOU Zhangbing, LI Xiaocui. Survey on Generative Diffusion Model[J].Computer Science, 2024, 51(1): 273-283.

References

[1]SMOLENSKY P.Information processing in dynamical systems:Foundations of harmony theory[R].Colorado Univ. at Boulder Dept. of Computer Science,1986.
[2]HINTON G E,OSINDERO S,TEH Y W.A Fast Learning Algorithm for Deep Belief Nets[J].Neural Computation,2006,18(7):1527-1554.
[3]HINTON G E,SALAKHUTDINOV R R.A Better Way to Pretrain Deep Boltzmann Machines[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems.2012:2447-2455.
[4]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial networks[J].Communications of the ACM,2020,63(11):139-144
[5]KINGMA D P,WELLING M.Auto-Encoding Variational Bayes[J].arXiv:1312.6114,2022.
[6]ZHANG M,SUN Y,MCDONAGH S,et al.Flow Based Models For Manifold Data[J].arXiv:2109.14216,2023.
[7]REZENDE D,MOHAMED S.Variational Inference with Nor-malizing Flows[C]//Proceedings of the 32nd International Conference on Machine Learning.2015:1530-1538.
[8]LECUN Y,CHOPRA S,HADSELL R,et al.A Tutorial onEnergy-Based Learning[M]//Predicting Structured Data.2006.
[9]VAN DEN OORD A,VINYALS O,KALCHBRENNER N,et al.Conditional Image Generation with PixelCNN Decoders[C]//Advances in Neural Information Processing Systems.2016:4797-4805.
[10]HO J,JAIN A,ABBEEL P.Denoising Diffusion ProbabilisticModels[C]//Advances in Neural Information Processing Systems.2020:6840-6851.
[11]CHENG S I,CHEN Y J,CHIU W C,et al.Adaptively-Realistic Image Generation From Stroke and Sketch With Diffusion Model[C]//2023 IEEE/CVF Winter Conference on Applications of Computer Vision(WACV).2023:4043-4051.
[12]JOLICOEUR-MARTINEAU A,PICHÉ-TAILLEFER R,COM-BES R T DES,et al.Adversarialscore matching and improved sampling for image generation[J].arXiv:2009.05475,2020.
[13]CHEN T,ZHANG R,HINTON G.Analog Bits:GeneratingDiscrete Data using Diffusion Models with Self-Conditioning[J].arXiv 2208.04202,2022.
[14]GU Z,CHEN H,XU Z,et al.DiffusionInst:Diffusion Model for Instance Segmentation[J].arXiv:2212.02773,2022.
[15]XU J,WANG X,CHENG W,et al.Dream3D:Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models[J].arXiv:2212.14704,2022.
[16]YE M,WU L,LIU Q.First Hitting Diffusion Models for Gene-rating Manifold,Graph and Categorical Data[J].arXiv:2209.01170,2022.
[17]FURUSAWA C,KITAOKA S,LI M,et al.Generative Probabilistic Image Colorization[J].arXiv:2109.14518,2021.
[18]SAHARIA C,HO J,CHAN W,et al.Image Super-Resolution Via Iterative Refinement[J].IEEE Transactions on PatternAnalysis and Machine Intelligence,2022,45(4):4713-4726.
[19]ESSER P,ROMBACH R,BLATTMANN A,et al.Image-BART:Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis[C]//Advances in Neural Information Processing Systems.2021:3518-3532.
[20]BATZOLIS G,STANCZUK J,SCHÖNLIEB C B,et al.Non-Uniform Diffusion Models[J].arXiv:2207.09786,2022.
[21]LEE S,CHUNG H,KIM J,et al.Progressive Deblurring of Diffusion Models for Coarse-to-Fine Image Synthesis[J].arXiv:2207.11192,2022.
[22]GAO Z,GUO J,TAN X,et al.Difformer:Empowering Diffusion Models on the Embedding Space for Text Generation[J].arXiv:2212.09412,2023.
[23]GONG S,LI M,FENG J,et al.DiffuSeq:Sequence to Sequence Text Generation with Diffusion Models[J].arXiv:2210.08933,2022.
[24]REID M,HELLENDOORN V J,NEUBIG G.DiffusER:Dis-crete Diffusion via Edit-based Reconstruction[J].arXiv:2210.16886,2022.
[25]LI X L,THICKSTUN J,GULRAJANI I,et al.Diffusion-LMImproves Controllable Text Generation[J].arXiv:2205.14217,2022.
[26]HE Z,SUN T,WANG K,et al.DiffusionBERT:Improving Ge-nerative Masked Language Models with Diffusion Models[J].arXiv:2211.15029,2022.
[27]LIN Z,GONG Y,SHEN Y,et al.GENIE:Large Scale Pre-trai-ning for Text Generation with Diffusion Model[J].arXiv:2212.11685,2022.
[28]TASHIRO Y,SONG J,SONG Y,et al.CSDI:Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation[C]//Advances in Neural Information Processing Systems.2021:24804-24816.
[29]ALCARAZ J M L,STRODTHOFF N.Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models[J].arXiv:2208.09399,2022.
[30]RASUL K,SEWARD C,SCHUSTER I,et al.AutoregressiveDenoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting[C]//Proceedings of the 38th International Conference on Machine Learning.2021:8857-8868.
[31]GU S,CHEN D,BAO J,et al.Vector Quantized Diffusion Model for Text-to-Image Synthesis[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2022:10686-10696.
[32]RAMESH A,DHARIWAL P,NICHOL A,et al.HierarchicalText-Conditional Image Generation with CLIP Latents[J].ar-Xiv:2204.06125,2022.
[33]SAHARIA C,CHAN W,SAXENA S,et al.Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding[J].arXiv:2205.11487,2022.
[34]NICHOL A,JUN H,DHARIWAL P,et al.Point-E:A System for Generating 3D Point Clouds from Complex Prompts[J].ar-Xiv:2212.08751,2022.
[35]POPOV V,VOVK I,GOGORYAN V,et al.Grad-TTS:A Diffusion Probabilistic Model for Text-to-Speech[C]//Proceedings of the 38th International Conference on Machine Learning.2021:8599-8608.
[36]JEONG M,KIM H,KIM H,et al.Diff-TTS:A Denoising Diffusion Model for Text-to-Speech[J].arXiv:2014.01409,2021.
[37]CHEN Z,WU Y,LENG Y,et al.ResGrad:Residual Denoising Diffusion Probabilistic Models for Text to Speech[J].arXiv:2212.14518,2022.
[38]KIM B,YE J C.Diffusion Deformable Model for 4D Temporal Medical Image Generation[J].arXiv:2206.13295,2022.
[39]WOLLEB J,BIEDER F,SANDKÜHLER R,et al.DiffusionModels for Medical Anomaly Detection[J].arXiv:2203.04306,2022.
[40]SANCHEZ P,KASCENAS A,LIU X,el al.What is Healthy? Generative Counterfactual Diffusion for Lesion Localization[C]//Deep Generative Models.Cham:Springer Nature Switzerland.2022:34-44.
[41]WYATT J,LEACH A,SCHMON S M,et al.AnoDDPM:Anomaly Detection with Denoising Diffusion Probabilistic Mo-dels using Simplex Noise[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPRW).2022:649-655.
[42]SONG Y,SHEN L,XING L,et al.Solving Inverse Problems in Medical Imaging with Score-Based Generative Models[J].ar-Xiv:2111.08005,2021.
[43]CHUNG H,YE J C.Score-based diffusion models for accele-rated MRI[J].arXiv:2110.05243,2022.
[44]WU K E,YANG K K,BERG R VAN DEN,et al.Protein structure generation via folding diffusion[J].arXiv:2209.15611,2022.
[45]LEE J S,KIM P M.ProteinSGM:Score-based generative mode-ling for de novo protein design[J].Nature Computational Science,2023,3(5):382-392.
[46]ANAND N,ACHIM T.Protein Structure and Sequence Generation with Equivariant Denoising Diffusion Probabilistic Models[J].arXiv:2205.15019,2022.
[47]CROITORU F A,HONDRU V,IONESCU R T,et al.Diffusion Models in Vision:A Survey[J].arXiv:2209.04747,2022.
[48]YANG L,ZHANG Z,SONG Y,et al.Diffusion Models:A Comprehensive Survey ofMethods and Applications[J].arXiv:2209.00796,2022.
[49]CAO H,TAN C,GAO Z,et al.A Survey on Generative Diffusion Model[J].arXiv:2209.02646,2022.
[50]LAM M W Y,LAM M W Y,WANG J,et al.Bilateral Denoising Diffusion Models.[J].arXiv:2108.11514,2021.
[51]GIANNONE G,NIELSEN D,WINTHER O.Few-Shot Diffusion Models[J].arXiv:2205.15463,2022.
[52]SONG Y,ERMON S.Improved Techniques for Training Score-Based Generative Models[C]//Advances in Neural Information Processing Systems.2020:12438-12448.
[53]SONG Y,DURKAN C,MURRAY I,et al.Maximum Likelihood Training of Score-Based Diffusion Models[C]//Advances in Neural Information Processing Systems.2021:1415-1428.
[54]SONG Y,ERMON S.Generative Modeling by Estimating Gra-dients ofthe Data Distribution[C]//Advances in Neural Information Processing Systems.2019.
[55]SONG Y,SOHL-DICKSTEIN J,KINGMA D P,et al.Score-Based Generative Modeling through Stochastic Differential Equations[J].arXiv:2011.13456,2021.
[56]WILSON J T,MORICONI R,HUTTER F,et al.The reparameterization trick for acquisition functions[J].arXiv:1712.00424,2017.
[57]SOHL-DICKSTEIN J,WEISS E,MAHESWARANATHANN,et al.Deep Unsupervised Learning using Nonequilibrium Thermodynamics[C]//Proceedings of the 32nd International Conference on Machine Learning.PMLR.2015:2256-2265.
[58]RONNEBERGER O,FISCHER P,BROX T.U-Net:Convolu-tional Networks for Biomedical Image Segmentation[J].arXiv:1505.04597,2015.
[59]WELLING M,TEH Y W.Bayesian Learning via StochasticGradient Langevin Dynamics[C]//Proceedings of the 28th International Conference on Machine Learning(ICML-11).2011:681-688.
[60]VINCENT P.A Connection Between Score Matching and De-noising Autoencoders[J].Neural Computation,2011,23(7):1661-1674.
[61]SONG Y,GARG S,SHI J,et al.Sliced Score Matching:A Scalable Approach to Density and Score Estimation[C]//Proceedings of The 35th Uncertainty in Artificial Intelligence Conference.2020:574-584.
[62]NICHOL A Q,DHARIWAL P.Improved Denoising Diffusion Probabilistic Models[C]//Proceedings of the 38th International Conference on Machine Learning.2021:8162-8171.
[63]KINGMA D,SALIMANS T,POOLE B,et al.Variational Diffusion Models[C]//Advances in Neural Information Processing Systems.2021:21696-21707.
[64]SAN-ROMAN R,NACHMANI E,WOLF L.Noise Estimation for Generative Diffusion Models[J].arXiv:2104.02600,2021.
[65]WANG J,LYU Z,LIN D,et al.Guided Diffusion Model for Adversarial Purification[J].arXiv:2205.14969,2022.
[66]SONG J,MENG C,ERMON S.Denoising Diffusion ImplicitModels[J].arXiv:2010.02502,2020.
[67]ZHANG Q,TAO M,CHEN Y.gDDIM:Generalized denoising diffusion implicit models[J].arXiv:2206.05564,2022.
[68]SINHA A,SONG J,MENG C,et al.D2C:Diffusion-Denoising Models for Few-shot Conditional Generation.[J].arXiv:2106.06819,2021.
[69]PEEBLES W,XIE S.Scalable Diffusion Models with Trans-formers[J].arXiv:2212.09748,2022.
[70]SEHWAG V,HAZIRBAS C,GORDO A,et al.Generating High Fidelity Data From Low-Density Regions Using Diffusion Mo-dels[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2022:11492-11501.
[71]KIM G,JANG W,LEE G,et al.DAG:Depth-Aware Guidance with Denoising Diffusion Probabilistic Models[J].arXiv:2212.08861,2023.
[72]AUSTIN J,JOHNSON D D,HO J,et al.Structured Denoising Diffusion Models in Discrete State-Spaces[C]//Advances in Neural Information Processing Systems.2021:17981-17993.
[73]WATSON D,CHAN W,HO J,et al.Learning Fast Samplers for Diffusion Models by Differentiating Through Sample Quality[J].arXiv:2202.05830,2022.
[74]WATSON D,HO J,NOROUZI M,et al.Learning to efficiently sample from diffusion probabilistic models[J].arXiv:2106.03802,2021.
[75]XIAO Z,KREIS K,VAHDAT A.Tackling the generative lear-ning trilemma with denoising diffusion GANs[J].arXiv:2112.07804,2021.
[76]BAO F,LI C,SUN J,et al.Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models[J].ar-Xiv:2212.08861,2022.
[77]CHUNG H,SIM B,YE J C.Come-Closer-Diffuse-Faster:Acce-lerating Conditional Diffusion Models for Inverse Problems Through Stochastic Contraction[C]//2022 IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition(CVPR).2022:12413-12422.
[78]VAHDAT A,KREIS K,KAUTZ J.Score-based GenerativeModeling in Latent Space[C]//Advances in Neural Information Processing Systems.2021:11287-11302.
[79]ZHANG L,ZHU X,FENG J.Accelerating Score-based Generative Models with Preconditioned Diffusion Sampling[J].arXiv:2207.02196,2022.
[80]DU W,YANG T,ZHANG H,et al.A Flexible Diffusion Model[J].arXiv:2206.10365,2022.
[81]HO J,SALIMANS T.Classifier-Free Diffusion Guidance[J].arXiv:2207.12598,2022.
[82]ZHANG Q,CHEN Y.Diffusion Normalizing Flow[J].arXiv:2110.07579,2021.
[83]KIM D,NA B,KWON S J,et al.Maximum Likelihood Training of Implicit Nonlinear Diffusion Models[J].arXiv:2205.13699,2022.
[84]DOCKHORN T,VAHDAT A,KREIS K.Score-Based Generative Modeling with Critically-Damped Langevin Diffusion[J].arXiv:2112.07068,2022.
[85]BORTOLI V D,THORNTON J,HENG J,et al.DiffusionSchrödinger Bridge with Applications to Score-Based Generative Modeling[J].arXiv:2106.01357,2021.
[86]LIU L,REN Y,LIN Z,et al.Pseudo Numerical Methods for Dif-fusion Models on Manifolds[J].arXiv:2202.09778,2022.
[87]JOLICOEUR-MARTINEAU A,LI K.Gotta Go Fast WhenGenerating Data with Score-Based Models[J].arXiv:2105.14080,2021.
[88]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll you Need[C]//Advances in Neural Information Processing Systems.2017.
[89]WANG Z,ZHENG H,HE P,et al.Diffusion-GAN:TrainingGANs with Diffusion[J].arXiv:2206.02262,2022.
[90]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.AnImage is Worth 16x16 Words:Transformers for Image Recognition at Scale[J].arXiv:2010.11929,2021.
[91]DEASY J,SIMIDJIEVSKI N,LIÒ P.Heavy-tailed denoisingscore matching[J].arXiv:2112.09788,2022.
[92]KARRAS T,AITTALA M,AILA T,et al.Elucidating the Design Space of Diffusion-Based Generative Models[J].arXiv:2206.00364,2022.
[93]LI H,YIFAN Y,CHANG M,et al.SRDiff:Single Image Super-Resolution with Diffusion Probabilistic Models[J].arXiv:2104.14951,2021.
[94]HO J,SAHARIA C,CHAN W,et al.Cascaded Diffusion Models for High Fidelity Image Generation[J].arXiv:2106.15282,2021.
[95]SASAKI H,WILLCOCKS C G,BRECKON T P.UNIT-DDPM:UNpaired Image Translation with Denoising Diffusion Probabilistic Models[J].arXiv:2104.05358,2021.
[96]WANG W,BAO J,ZHOU W,et al.Semantic Image Synthesis via Diffusion Models[J].arXiv:2207.00050,2022.
[97]LEWIS M,LIU Y,GOYAL N,et al.BART:Denoising Se-quence-to-Sequence Pre-training for Natural Language Generation,Translation,and Comprehension[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:7871-7880.
[98]ZHOU L Q,DU Y L,WU J J,et al.3D Shape Generation and Completion through Point-Voxel Diffusion[J].arXiv::2104.03670,2021.
[99]LUO S T,HU W.Diffusion Probabilistic Models for 3D Point Cloud Generation[J].arXiv:2103.01458,2021.
[100]LEE J,IM W,LEE S,et al.Diffusion Probabilistic Models for Scene-Scale 3D Categorical Data[J].arXiv:2301.00527,2023.
[101]DEVLIN J,CHANG MW,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Minneapolis,Minnesota:Association for Computational Linguistics,2019:4171-4186.
[102]GU A,GOEL K,RÉ C.Efficiently Modeling Long Sequences with Structured State Spaces[J].arXiv:2111.00396,2022.
[103]SCHMIDT R M.Recurrent Neural Networks(RNNs):A gentle Introduction and Overview[J].arXiv:1912.05911,2019.
[104]RADFORD A,KIM J W,HALLACY C,et al.Learning Transferable Visual Models From Natural Language Supervision[J].arXiv:2103.00020,2021.
[105]MOLAD E,HORWITZ E,VALEVSKI D,et al.Dreamix:Vi-deo Diffusion Models are General Video Editors [J].arXiv:2302.01329,2023.
[106]WU J Z,GE Y,WANG X,et al.Tune-A-Video:One-Shot Tu-ning of Image Diffusion Models for Text-to-Video Generation [J].arXiv:2212.11565,2022.
[107]FRANZESE G,ROSSI S,YANG L,et al.How Much isEnough? A Study onDiffusion Times in Score-based Generative Models[J].arXiv:2206.05173,2022.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Survey on Generative Diffusion Model

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0

[1]	DU Yu, YU Zishu, PENG Xiaohui, XU Zhiwei. Padding Load:Load Reducing Cluster Resource Waste and Deep Learning Training Costs [J]. Computer Science, 2024, 51(9): 71-79.
[2]	XU Jinlong, GUI Zhonghua, LI Jia'nan, LI Yingying, HAN Lin. FP8 Quantization and Inference Memory Optimization Based on MLIR [J]. Computer Science, 2024, 51(9): 112-120.
[3]	XU Bei, LIU Tong. Semi-supervised Emotional Music Generation Method Based on Improved Gaussian Mixture Variational Autoencoders [J]. Computer Science, 2024, 51(8): 281-296.
[4]	CHEN Siyu, MA Hailong, ZHANG Jianhui. Encrypted Traffic Classification of CNN and BiGRU Based on Self-attention [J]. Computer Science, 2024, 51(8): 396-402.
[5]	SUN Yumo, LI Xinhang, ZHAO Wenjie, ZHU Li, LIANG Ya’nan. Driving Towards Intelligent Future:The Application of Deep Learning in Rail Transit Innovation [J]. Computer Science, 2024, 51(8): 1-10.
[6]	KONG Lingchao, LIU Guozhu. Review of Outlier Detection Algorithms [J]. Computer Science, 2024, 51(8): 20-33.
[7]	TANG Ruiqi, XIAO Ting, CHI Ziqiu, WANG Zhe. Few-shot Image Classification Based on Pseudo-label Dependence Enhancement and NoiseInterferenceReduction [J]. Computer Science, 2024, 51(8): 152-159.
[8]	XIAO Xiao, BAI Zhengyao, LI Zekai, LIU Xuheng, DU Jiajin. Parallel Multi-scale with Attention Mechanism for Point Cloud Upsampling [J]. Computer Science, 2024, 51(8): 183-191.
[9]	ZHANG Junsan, CHENG Ming, SHEN Xiuxuan, LIU Yuxue, WANG Leiquan. Diversified Label Matrix Based Medical Image Report Generation [J]. Computer Science, 2024, 51(8): 200-208.
[10]	GUO Fangyuan, JI Genlin. Video Anomaly Detection Method Based on Dual Discriminators and Pseudo Video Generation [J]. Computer Science, 2024, 51(8): 217-223.
[11]	GAN Run, WEI Xianglin, WANG Chao, WANG Bin, WANG Min, FAN Jianhua. Backdoor Attack Method in Autoencoder End-to-End Communication System [J]. Computer Science, 2024, 51(7): 413-421.
[12]	YANG Heng, LIU Qinrang, FAN Wang, PEI Xue, WEI Shuai, WANG Xuan. Study on Deep Learning Automatic Scheduling Optimization Based on Feature Importance [J]. Computer Science, 2024, 51(7): 22-28.
[13]	LI Jiaying, LIANG Yudong, LI Shaoji, ZHANG Kunpeng, ZHANG Chao. Study on Algorithm of Depth Image Super-resolution Guided by High-frequency Information ofColor Images [J]. Computer Science, 2024, 51(7): 197-205.
[14]	SHI Dianxi, GAO Yunqi, SONG Linna, LIU Zhe, ZHOU Chenlei, CHEN Ying. Deep-Init:Non Joint Initialization Method for Visual Inertial Odometry Based on Deep Learning [J]. Computer Science, 2024, 51(7): 327-336.
[15]	FAN Yi, HU Tao, YI Peng. Host Anomaly Detection Framework Based on Multifaceted Information Fusion of SemanticFeatures for System Calls [J]. Computer Science, 2024, 51(7): 380-388.