Computer Science ›› 2024, Vol. 51 ›› Issue (1): 273-283.doi: 10.11896/jsjkx.230300057

• Artificial Intelligence • Previous Articles     Next Articles

Survey on Generative Diffusion Model

YAN Zhihao, ZHOU Zhangbing, LI Xiaocui   

  1. School of Information Engineering,China University of Geosciences(Beijing),Beijing 100083,China
  • Received:2023-03-06 Revised:2023-07-01 Online:2024-01-15 Published:2024-01-12
  • About author:YAN Zhihao,born in 1995,postgra-duate.His main research interest is deep learning.
    ZHOU Zhangbing,born in 1974,Ph.D,professor,is a member of CCF(No.28475M).His main research interests include wireless sensor networks,ser-vices computing and business process management.
  • Supported by:
    National Natural Science Foundation of China(42050103).

Abstract: Diffusion models have shown high-quality sample generation ability in the field of generative models,and constantly set new records for image generation evaluation indicator FID scores since their introduction,and has become a research hotspot in this field.However,related reviews of this kind are scarce in China.Therefore,this paper aims to summarize and analyze the research on related diffusion generative models.Firstly,it analyzes the related derivative models in each basic diffusion model,which focus on optimizing internal algorithms and efficient sampling,by discussing the characteristics and principles of three common models:denoising diffusion probabilistic model,score-based diffusion generative model,and diffusion generative model based on random differential equations.Secondly,it summarizes the current applications of diffusion models in computer vision,natural language processing,time series,multimodal,and interdisciplinary fields.Finally,based on the above discussion,relevant suggestions for the existing limitations of diffusion generative models are proposed,such as long sampling times and multiple sampling steps,and a research direction for the future development of diffusion generative models is provided based on previous studies.

Key words: Deep learning, Generative models, Denoising diffusion probabilistic models, Score-based diffusion models, Stochastic differential equations, Image generation

CLC Number: 

  • TP183
[1]SMOLENSKY P.Information processing in dynamical systems:Foundations of harmony theory[R].Colorado Univ. at Boulder Dept. of Computer Science,1986.
[2]HINTON G E,OSINDERO S,TEH Y W.A Fast Learning Algorithm for Deep Belief Nets[J].Neural Computation,2006,18(7):1527-1554.
[3]HINTON G E,SALAKHUTDINOV R R.A Better Way to Pretrain Deep Boltzmann Machines[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems.2012:2447-2455.
[4]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial networks[J].Communications of the ACM,2020,63(11):139-144
[5]KINGMA D P,WELLING M.Auto-Encoding Variational Bayes[J].arXiv:1312.6114,2022.
[6]ZHANG M,SUN Y,MCDONAGH S,et al.Flow Based Models For Manifold Data[J].arXiv:2109.14216,2023.
[7]REZENDE D,MOHAMED S.Variational Inference with Nor-malizing Flows[C]//Proceedings of the 32nd International Conference on Machine Learning.2015:1530-1538.
[8]LECUN Y,CHOPRA S,HADSELL R,et al.A Tutorial onEnergy-Based Learning[M]//Predicting Structured Data.2006.
[9]VAN DEN OORD A,VINYALS O,KALCHBRENNER N,et al.Conditional Image Generation with PixelCNN Decoders[C]//Advances in Neural Information Processing Systems.2016:4797-4805.
[10]HO J,JAIN A,ABBEEL P.Denoising Diffusion ProbabilisticModels[C]//Advances in Neural Information Processing Systems.2020:6840-6851.
[11]CHENG S I,CHEN Y J,CHIU W C,et al.Adaptively-Realistic Image Generation From Stroke and Sketch With Diffusion Model[C]//2023 IEEE/CVF Winter Conference on Applications of Computer Vision(WACV).2023:4043-4051.
[12]JOLICOEUR-MARTINEAU A,PICHÉ-TAILLEFER R,COM-BES R T DES,et al.Adversarialscore matching and improved sampling for image generation[J].arXiv:2009.05475,2020.
[13]CHEN T,ZHANG R,HINTON G.Analog Bits:GeneratingDiscrete Data using Diffusion Models with Self-Conditioning[J].arXiv 2208.04202,2022.
[14]GU Z,CHEN H,XU Z,et al.DiffusionInst:Diffusion Model for Instance Segmentation[J].arXiv:2212.02773,2022.
[15]XU J,WANG X,CHENG W,et al.Dream3D:Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models[J].arXiv:2212.14704,2022.
[16]YE M,WU L,LIU Q.First Hitting Diffusion Models for Gene-rating Manifold,Graph and Categorical Data[J].arXiv:2209.01170,2022.
[17]FURUSAWA C,KITAOKA S,LI M,et al.Generative Probabilistic Image Colorization[J].arXiv:2109.14518,2021.
[18]SAHARIA C,HO J,CHAN W,et al.Image Super-Resolution Via Iterative Refinement[J].IEEE Transactions on PatternAnalysis and Machine Intelligence,2022,45(4):4713-4726.
[19]ESSER P,ROMBACH R,BLATTMANN A,et al.Image-BART:Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis[C]//Advances in Neural Information Processing Systems.2021:3518-3532.
[20]BATZOLIS G,STANCZUK J,SCHÖNLIEB C B,et al.Non-Uniform Diffusion Models[J].arXiv:2207.09786,2022.
[21]LEE S,CHUNG H,KIM J,et al.Progressive Deblurring of Diffusion Models for Coarse-to-Fine Image Synthesis[J].arXiv:2207.11192,2022.
[22]GAO Z,GUO J,TAN X,et al.Difformer:Empowering Diffusion Models on the Embedding Space for Text Generation[J].arXiv:2212.09412,2023.
[23]GONG S,LI M,FENG J,et al.DiffuSeq:Sequence to Sequence Text Generation with Diffusion Models[J].arXiv:2210.08933,2022.
[24]REID M,HELLENDOORN V J,NEUBIG G.DiffusER:Dis-crete Diffusion via Edit-based Reconstruction[J].arXiv:2210.16886,2022.
[25]LI X L,THICKSTUN J,GULRAJANI I,et al.Diffusion-LMImproves Controllable Text Generation[J].arXiv:2205.14217,2022.
[26]HE Z,SUN T,WANG K,et al.DiffusionBERT:Improving Ge-nerative Masked Language Models with Diffusion Models[J].arXiv:2211.15029,2022.
[27]LIN Z,GONG Y,SHEN Y,et al.GENIE:Large Scale Pre-trai-ning for Text Generation with Diffusion Model[J].arXiv:2212.11685,2022.
[28]TASHIRO Y,SONG J,SONG Y,et al.CSDI:Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation[C]//Advances in Neural Information Processing Systems.2021:24804-24816.
[29]ALCARAZ J M L,STRODTHOFF N.Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models[J].arXiv:2208.09399,2022.
[30]RASUL K,SEWARD C,SCHUSTER I,et al.AutoregressiveDenoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting[C]//Proceedings of the 38th International Conference on Machine Learning.2021:8857-8868.
[31]GU S,CHEN D,BAO J,et al.Vector Quantized Diffusion Model for Text-to-Image Synthesis[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2022:10686-10696.
[32]RAMESH A,DHARIWAL P,NICHOL A,et al.HierarchicalText-Conditional Image Generation with CLIP Latents[J].ar-Xiv:2204.06125,2022.
[33]SAHARIA C,CHAN W,SAXENA S,et al.Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding[J].arXiv:2205.11487,2022.
[34]NICHOL A,JUN H,DHARIWAL P,et al.Point-E:A System for Generating 3D Point Clouds from Complex Prompts[J].ar-Xiv:2212.08751,2022.
[35]POPOV V,VOVK I,GOGORYAN V,et al.Grad-TTS:A Diffusion Probabilistic Model for Text-to-Speech[C]//Proceedings of the 38th International Conference on Machine Learning.2021:8599-8608.
[36]JEONG M,KIM H,KIM H,et al.Diff-TTS:A Denoising Diffusion Model for Text-to-Speech[J].arXiv:2014.01409,2021.
[37]CHEN Z,WU Y,LENG Y,et al.ResGrad:Residual Denoising Diffusion Probabilistic Models for Text to Speech[J].arXiv:2212.14518,2022.
[38]KIM B,YE J C.Diffusion Deformable Model for 4D Temporal Medical Image Generation[J].arXiv:2206.13295,2022.
[39]WOLLEB J,BIEDER F,SANDKÜHLER R,et al.DiffusionModels for Medical Anomaly Detection[J].arXiv:2203.04306,2022.
[40]SANCHEZ P,KASCENAS A,LIU X,el al.What is Healthy? Generative Counterfactual Diffusion for Lesion Localization[C]//Deep Generative Models.Cham:Springer Nature Switzerland.2022:34-44.
[41]WYATT J,LEACH A,SCHMON S M,et al.AnoDDPM:Anomaly Detection with Denoising Diffusion Probabilistic Mo-dels using Simplex Noise[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPRW).2022:649-655.
[42]SONG Y,SHEN L,XING L,et al.Solving Inverse Problems in Medical Imaging with Score-Based Generative Models[J].ar-Xiv:2111.08005,2021.
[43]CHUNG H,YE J C.Score-based diffusion models for accele-rated MRI[J].arXiv:2110.05243,2022.
[44]WU K E,YANG K K,BERG R VAN DEN,et al.Protein structure generation via folding diffusion[J].arXiv:2209.15611,2022.
[45]LEE J S,KIM P M.ProteinSGM:Score-based generative mode-ling for de novo protein design[J].Nature Computational Science,2023,3(5):382-392.
[46]ANAND N,ACHIM T.Protein Structure and Sequence Generation with Equivariant Denoising Diffusion Probabilistic Models[J].arXiv:2205.15019,2022.
[47]CROITORU F A,HONDRU V,IONESCU R T,et al.Diffusion Models in Vision:A Survey[J].arXiv:2209.04747,2022.
[48]YANG L,ZHANG Z,SONG Y,et al.Diffusion Models:A Comprehensive Survey ofMethods and Applications[J].arXiv:2209.00796,2022.
[49]CAO H,TAN C,GAO Z,et al.A Survey on Generative Diffusion Model[J].arXiv:2209.02646,2022.
[50]LAM M W Y,LAM M W Y,WANG J,et al.Bilateral Denoising Diffusion Models.[J].arXiv:2108.11514,2021.
[51]GIANNONE G,NIELSEN D,WINTHER O.Few-Shot Diffusion Models[J].arXiv:2205.15463,2022.
[52]SONG Y,ERMON S.Improved Techniques for Training Score-Based Generative Models[C]//Advances in Neural Information Processing Systems.2020:12438-12448.
[53]SONG Y,DURKAN C,MURRAY I,et al.Maximum Likelihood Training of Score-Based Diffusion Models[C]//Advances in Neural Information Processing Systems.2021:1415-1428.
[54]SONG Y,ERMON S.Generative Modeling by Estimating Gra-dients ofthe Data Distribution[C]//Advances in Neural Information Processing Systems.2019.
[55]SONG Y,SOHL-DICKSTEIN J,KINGMA D P,et al.Score-Based Generative Modeling through Stochastic Differential Equations[J].arXiv:2011.13456,2021.
[56]WILSON J T,MORICONI R,HUTTER F,et al.The reparameterization trick for acquisition functions[J].arXiv:1712.00424,2017.
[57]SOHL-DICKSTEIN J,WEISS E,MAHESWARANATHANN,et al.Deep Unsupervised Learning using Nonequilibrium Thermodynamics[C]//Proceedings of the 32nd International Conference on Machine Learning.PMLR.2015:2256-2265.
[58]RONNEBERGER O,FISCHER P,BROX T.U-Net:Convolu-tional Networks for Biomedical Image Segmentation[J].arXiv:1505.04597,2015.
[59]WELLING M,TEH Y W.Bayesian Learning via StochasticGradient Langevin Dynamics[C]//Proceedings of the 28th International Conference on Machine Learning(ICML-11).2011:681-688.
[60]VINCENT P.A Connection Between Score Matching and De-noising Autoencoders[J].Neural Computation,2011,23(7):1661-1674.
[61]SONG Y,GARG S,SHI J,et al.Sliced Score Matching:A Scalable Approach to Density and Score Estimation[C]//Proceedings of The 35th Uncertainty in Artificial Intelligence Conference.2020:574-584.
[62]NICHOL A Q,DHARIWAL P.Improved Denoising Diffusion Probabilistic Models[C]//Proceedings of the 38th International Conference on Machine Learning.2021:8162-8171.
[63]KINGMA D,SALIMANS T,POOLE B,et al.Variational Diffusion Models[C]//Advances in Neural Information Processing Systems.2021:21696-21707.
[64]SAN-ROMAN R,NACHMANI E,WOLF L.Noise Estimation for Generative Diffusion Models[J].arXiv:2104.02600,2021.
[65]WANG J,LYU Z,LIN D,et al.Guided Diffusion Model for Adversarial Purification[J].arXiv:2205.14969,2022.
[66]SONG J,MENG C,ERMON S.Denoising Diffusion ImplicitModels[J].arXiv:2010.02502,2020.
[67]ZHANG Q,TAO M,CHEN Y.gDDIM:Generalized denoising diffusion implicit models[J].arXiv:2206.05564,2022.
[68]SINHA A,SONG J,MENG C,et al.D2C:Diffusion-Denoising Models for Few-shot Conditional Generation.[J].arXiv:2106.06819,2021.
[69]PEEBLES W,XIE S.Scalable Diffusion Models with Trans-formers[J].arXiv:2212.09748,2022.
[70]SEHWAG V,HAZIRBAS C,GORDO A,et al.Generating High Fidelity Data From Low-Density Regions Using Diffusion Mo-dels[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2022:11492-11501.
[71]KIM G,JANG W,LEE G,et al.DAG:Depth-Aware Guidance with Denoising Diffusion Probabilistic Models[J].arXiv:2212.08861,2023.
[72]AUSTIN J,JOHNSON D D,HO J,et al.Structured Denoising Diffusion Models in Discrete State-Spaces[C]//Advances in Neural Information Processing Systems.2021:17981-17993.
[73]WATSON D,CHAN W,HO J,et al.Learning Fast Samplers for Diffusion Models by Differentiating Through Sample Quality[J].arXiv:2202.05830,2022.
[74]WATSON D,HO J,NOROUZI M,et al.Learning to efficiently sample from diffusion probabilistic models[J].arXiv:2106.03802,2021.
[75]XIAO Z,KREIS K,VAHDAT A.Tackling the generative lear-ning trilemma with denoising diffusion GANs[J].arXiv:2112.07804,2021.
[76]BAO F,LI C,SUN J,et al.Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models[J].ar-Xiv:2212.08861,2022.
[77]CHUNG H,SIM B,YE J C.Come-Closer-Diffuse-Faster:Acce-lerating Conditional Diffusion Models for Inverse Problems Through Stochastic Contraction[C]//2022 IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition(CVPR).2022:12413-12422.
[78]VAHDAT A,KREIS K,KAUTZ J.Score-based GenerativeModeling in Latent Space[C]//Advances in Neural Information Processing Systems.2021:11287-11302.
[79]ZHANG L,ZHU X,FENG J.Accelerating Score-based Generative Models with Preconditioned Diffusion Sampling[J].arXiv:2207.02196,2022.
[80]DU W,YANG T,ZHANG H,et al.A Flexible Diffusion Model[J].arXiv:2206.10365,2022.
[81]HO J,SALIMANS T.Classifier-Free Diffusion Guidance[J].arXiv:2207.12598,2022.
[82]ZHANG Q,CHEN Y.Diffusion Normalizing Flow[J].arXiv:2110.07579,2021.
[83]KIM D,NA B,KWON S J,et al.Maximum Likelihood Training of Implicit Nonlinear Diffusion Models[J].arXiv:2205.13699,2022.
[84]DOCKHORN T,VAHDAT A,KREIS K.Score-Based Generative Modeling with Critically-Damped Langevin Diffusion[J].arXiv:2112.07068,2022.
[85]BORTOLI V D,THORNTON J,HENG J,et al.DiffusionSchrödinger Bridge with Applications to Score-Based Generative Modeling[J].arXiv:2106.01357,2021.
[86]LIU L,REN Y,LIN Z,et al.Pseudo Numerical Methods for Dif-fusion Models on Manifolds[J].arXiv:2202.09778,2022.
[87]JOLICOEUR-MARTINEAU A,LI K.Gotta Go Fast WhenGenerating Data with Score-Based Models[J].arXiv:2105.14080,2021.
[88]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll you Need[C]//Advances in Neural Information Processing Systems.2017.
[89]WANG Z,ZHENG H,HE P,et al.Diffusion-GAN:TrainingGANs with Diffusion[J].arXiv:2206.02262,2022.
[90]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.AnImage is Worth 16x16 Words:Transformers for Image Recognition at Scale[J].arXiv:2010.11929,2021.
[91]DEASY J,SIMIDJIEVSKI N,LIÒ P.Heavy-tailed denoisingscore matching[J].arXiv:2112.09788,2022.
[92]KARRAS T,AITTALA M,AILA T,et al.Elucidating the Design Space of Diffusion-Based Generative Models[J].arXiv:2206.00364,2022.
[93]LI H,YIFAN Y,CHANG M,et al.SRDiff:Single Image Super-Resolution with Diffusion Probabilistic Models[J].arXiv:2104.14951,2021.
[94]HO J,SAHARIA C,CHAN W,et al.Cascaded Diffusion Models for High Fidelity Image Generation[J].arXiv:2106.15282,2021.
[95]SASAKI H,WILLCOCKS C G,BRECKON T P.UNIT-DDPM:UNpaired Image Translation with Denoising Diffusion Probabilistic Models[J].arXiv:2104.05358,2021.
[96]WANG W,BAO J,ZHOU W,et al.Semantic Image Synthesis via Diffusion Models[J].arXiv:2207.00050,2022.
[97]LEWIS M,LIU Y,GOYAL N,et al.BART:Denoising Se-quence-to-Sequence Pre-training for Natural Language Generation,Translation,and Comprehension[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:7871-7880.
[98]ZHOU L Q,DU Y L,WU J J,et al.3D Shape Generation and Completion through Point-Voxel Diffusion[J].arXiv::2104.03670,2021.
[99]LUO S T,HU W.Diffusion Probabilistic Models for 3D Point Cloud Generation[J].arXiv:2103.01458,2021.
[100]LEE J,IM W,LEE S,et al.Diffusion Probabilistic Models for Scene-Scale 3D Categorical Data[J].arXiv:2301.00527,2023.
[101]DEVLIN J,CHANG MW,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Minneapolis,Minnesota:Association for Computational Linguistics,2019:4171-4186.
[102]GU A,GOEL K,RÉ C.Efficiently Modeling Long Sequences with Structured State Spaces[J].arXiv:2111.00396,2022.
[103]SCHMIDT R M.Recurrent Neural Networks(RNNs):A gentle Introduction and Overview[J].arXiv:1912.05911,2019.
[104]RADFORD A,KIM J W,HALLACY C,et al.Learning Transferable Visual Models From Natural Language Supervision[J].arXiv:2103.00020,2021.
[105]MOLAD E,HORWITZ E,VALEVSKI D,et al.Dreamix:Vi-deo Diffusion Models are General Video Editors [J].arXiv:2302.01329,2023.
[106]WU J Z,GE Y,WANG X,et al.Tune-A-Video:One-Shot Tu-ning of Image Diffusion Models for Text-to-Video Generation [J].arXiv:2212.11565,2022.
[107]FRANZESE G,ROSSI S,YANG L,et al.How Much isEnough? A Study onDiffusion Times in Score-based Generative Models[J].arXiv:2206.05173,2022.
[1] GE Huibin, WANG Dexin, ZHENG Tao, ZHANG Ting, XIONG Deyi. Study on Model Migration of Natural Language Processing for Domestic Deep Learning Platform [J]. Computer Science, 2024, 51(1): 50-59.
[2] JING Yeyiran, YU Zeng, SHI Yunxiao, LI Tianrui. Review of Unsupervised Domain Adaptive Person Re-identification Based on Pseudo-labels [J]. Computer Science, 2024, 51(1): 72-83.
[3] JIN Yu, CHEN Hongmei, LUO Chuan. Interest Capturing Recommendation Based on Knowledge Graph [J]. Computer Science, 2024, 51(1): 133-142.
[4] SUN Shukui, FAN Jing, SUN Zhongqing, QU Jinshuai, DAI Tingting. Survey of Image Data Augmentation Techniques Based on Deep Learning [J]. Computer Science, 2024, 51(1): 150-167.
[5] WANG Weijia, XIONG Wenzhuo, ZHU Shengjie, SONG Ce, SUN He, SONG Yulong. Method of Infrared Small Target Detection Based on Multi-depth Feature Connection [J]. Computer Science, 2024, 51(1): 175-183.
[6] CHEN Tianyi, XUE Wen, QUAN Yuhui, XU Yong. Raindrop In-Situ Captured Benchmark Image Dataset and Evaluation [J]. Computer Science, 2024, 51(1): 190-197.
[7] SHI Dianxi, LIU Yangyang, SONG Linna, TAN Jiefu, ZHOU Chenlei, ZHANG Yi. FeaEM:Feature Enhancement-based Method for Weakly Supervised Salient Object Detection via Multiple Pseudo Labels [J]. Computer Science, 2024, 51(1): 233-242.
[8] ZHOU Wenhao, HU Hongtao, CHEN Xu, ZHAO Chunhui. Weakly Supervised Video Anomaly Detection Based on Dual Dynamic Memory Network [J]. Computer Science, 2024, 51(1): 243-251.
[9] HOU Jing, DENG Xiaomei, HAN Pengwu. Survey on Domain Limited Relation Extraction [J]. Computer Science, 2024, 51(1): 252-265.
[10] ZHAO Mingmin, YANG Qiuhui, HONG Mei, CAI Chuang. Smart Contract Fuzzing Based on Deep Learning and Information Feedback [J]. Computer Science, 2023, 50(9): 117-122.
[11] LI Haiming, ZHU Zhiheng, LIU Lei, GUO Chenkai. Multi-task Graph-embedding Deep Prediction Model for Mobile App Rating Recommendation [J]. Computer Science, 2023, 50(9): 160-167.
[12] HUANG Hanqiang, XING Yunbing, SHEN Jianfei, FAN Feiyi. Sign Language Animation Splicing Model Based on LpTransformer Network [J]. Computer Science, 2023, 50(9): 184-191.
[13] ZHU Ye, HAO Yingguang, WANG Hongyu. Deep Learning Based Salient Object Detection in Infrared Video [J]. Computer Science, 2023, 50(9): 227-234.
[14] ZHANG Yian, YANG Ying, REN Gang, WANG Gang. Study on Multimodal Online Reviews Helpfulness Prediction Based on Attention Mechanism [J]. Computer Science, 2023, 50(8): 37-44.
[15] SONG Xinyang, YAN Zhiyuan, SUN Muyi, DAI Linlin, LI Qi, SUN Zhenan. Review of Talking Face Generation [J]. Computer Science, 2023, 50(8): 68-78.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!