计算机科学 ›› 2024, Vol. 51 ›› Issue (1): 273-283.doi: 10.11896/jsjkx.230300057
闫志浩, 周长兵, 李小翠
YAN Zhihao, ZHOU Zhangbing, LI Xiaocui
摘要: 扩散模型在生成模型领域具有高质量的样本生成能力,一经推出就不断地刷新图像生成评价指标FID分数的记录,成为了该领域的研究热点,而此类相关综述在国内还鲜有介绍。因此,文中对相关扩散生成模型的研究进行汇总与分析。首先,对去噪扩散概率模型、基于分数的扩散生成模型和随机微分方程的扩散生成模型这3类通用模型的特点和原理进行了论述,就每一类基本扩散模型中以优化模型内部算法、高效采样为改进目标的相关衍生模型进行分析。其次,对当下扩散模型在计算机视觉、自然语言处理、时间序列、多模态和跨学科领域等方面的应用进行总结。最后,基于上述论述,分别就目前扩散生成模型存在的采样步骤多、采样时间长等局限性提出了相关建议,并结合前述研究对未来扩散生成模型的发展方向进行了研判。
中图分类号:
[1]SMOLENSKY P.Information processing in dynamical systems:Foundations of harmony theory[R].Colorado Univ. at Boulder Dept. of Computer Science,1986. [2]HINTON G E,OSINDERO S,TEH Y W.A Fast Learning Algorithm for Deep Belief Nets[J].Neural Computation,2006,18(7):1527-1554. [3]HINTON G E,SALAKHUTDINOV R R.A Better Way to Pretrain Deep Boltzmann Machines[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems.2012:2447-2455. [4]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial networks[J].Communications of the ACM,2020,63(11):139-144 [5]KINGMA D P,WELLING M.Auto-Encoding Variational Bayes[J].arXiv:1312.6114,2022. [6]ZHANG M,SUN Y,MCDONAGH S,et al.Flow Based Models For Manifold Data[J].arXiv:2109.14216,2023. [7]REZENDE D,MOHAMED S.Variational Inference with Nor-malizing Flows[C]//Proceedings of the 32nd International Conference on Machine Learning.2015:1530-1538. [8]LECUN Y,CHOPRA S,HADSELL R,et al.A Tutorial onEnergy-Based Learning[M]//Predicting Structured Data.2006. [9]VAN DEN OORD A,VINYALS O,KALCHBRENNER N,et al.Conditional Image Generation with PixelCNN Decoders[C]//Advances in Neural Information Processing Systems.2016:4797-4805. [10]HO J,JAIN A,ABBEEL P.Denoising Diffusion ProbabilisticModels[C]//Advances in Neural Information Processing Systems.2020:6840-6851. [11]CHENG S I,CHEN Y J,CHIU W C,et al.Adaptively-Realistic Image Generation From Stroke and Sketch With Diffusion Model[C]//2023 IEEE/CVF Winter Conference on Applications of Computer Vision(WACV).2023:4043-4051. [12]JOLICOEUR-MARTINEAU A,PICHÉ-TAILLEFER R,COM-BES R T DES,et al.Adversarialscore matching and improved sampling for image generation[J].arXiv:2009.05475,2020. [13]CHEN T,ZHANG R,HINTON G.Analog Bits:GeneratingDiscrete Data using Diffusion Models with Self-Conditioning[J].arXiv 2208.04202,2022. [14]GU Z,CHEN H,XU Z,et al.DiffusionInst:Diffusion Model for Instance Segmentation[J].arXiv:2212.02773,2022. [15]XU J,WANG X,CHENG W,et al.Dream3D:Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models[J].arXiv:2212.14704,2022. [16]YE M,WU L,LIU Q.First Hitting Diffusion Models for Gene-rating Manifold,Graph and Categorical Data[J].arXiv:2209.01170,2022. [17]FURUSAWA C,KITAOKA S,LI M,et al.Generative Probabilistic Image Colorization[J].arXiv:2109.14518,2021. [18]SAHARIA C,HO J,CHAN W,et al.Image Super-Resolution Via Iterative Refinement[J].IEEE Transactions on PatternAnalysis and Machine Intelligence,2022,45(4):4713-4726. [19]ESSER P,ROMBACH R,BLATTMANN A,et al.Image-BART:Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis[C]//Advances in Neural Information Processing Systems.2021:3518-3532. [20]BATZOLIS G,STANCZUK J,SCHÖNLIEB C B,et al.Non-Uniform Diffusion Models[J].arXiv:2207.09786,2022. [21]LEE S,CHUNG H,KIM J,et al.Progressive Deblurring of Diffusion Models for Coarse-to-Fine Image Synthesis[J].arXiv:2207.11192,2022. [22]GAO Z,GUO J,TAN X,et al.Difformer:Empowering Diffusion Models on the Embedding Space for Text Generation[J].arXiv:2212.09412,2023. [23]GONG S,LI M,FENG J,et al.DiffuSeq:Sequence to Sequence Text Generation with Diffusion Models[J].arXiv:2210.08933,2022. [24]REID M,HELLENDOORN V J,NEUBIG G.DiffusER:Dis-crete Diffusion via Edit-based Reconstruction[J].arXiv:2210.16886,2022. [25]LI X L,THICKSTUN J,GULRAJANI I,et al.Diffusion-LMImproves Controllable Text Generation[J].arXiv:2205.14217,2022. [26]HE Z,SUN T,WANG K,et al.DiffusionBERT:Improving Ge-nerative Masked Language Models with Diffusion Models[J].arXiv:2211.15029,2022. [27]LIN Z,GONG Y,SHEN Y,et al.GENIE:Large Scale Pre-trai-ning for Text Generation with Diffusion Model[J].arXiv:2212.11685,2022. [28]TASHIRO Y,SONG J,SONG Y,et al.CSDI:Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation[C]//Advances in Neural Information Processing Systems.2021:24804-24816. [29]ALCARAZ J M L,STRODTHOFF N.Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models[J].arXiv:2208.09399,2022. [30]RASUL K,SEWARD C,SCHUSTER I,et al.AutoregressiveDenoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting[C]//Proceedings of the 38th International Conference on Machine Learning.2021:8857-8868. [31]GU S,CHEN D,BAO J,et al.Vector Quantized Diffusion Model for Text-to-Image Synthesis[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2022:10686-10696. [32]RAMESH A,DHARIWAL P,NICHOL A,et al.HierarchicalText-Conditional Image Generation with CLIP Latents[J].ar-Xiv:2204.06125,2022. [33]SAHARIA C,CHAN W,SAXENA S,et al.Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding[J].arXiv:2205.11487,2022. [34]NICHOL A,JUN H,DHARIWAL P,et al.Point-E:A System for Generating 3D Point Clouds from Complex Prompts[J].ar-Xiv:2212.08751,2022. [35]POPOV V,VOVK I,GOGORYAN V,et al.Grad-TTS:A Diffusion Probabilistic Model for Text-to-Speech[C]//Proceedings of the 38th International Conference on Machine Learning.2021:8599-8608. [36]JEONG M,KIM H,KIM H,et al.Diff-TTS:A Denoising Diffusion Model for Text-to-Speech[J].arXiv:2014.01409,2021. [37]CHEN Z,WU Y,LENG Y,et al.ResGrad:Residual Denoising Diffusion Probabilistic Models for Text to Speech[J].arXiv:2212.14518,2022. [38]KIM B,YE J C.Diffusion Deformable Model for 4D Temporal Medical Image Generation[J].arXiv:2206.13295,2022. [39]WOLLEB J,BIEDER F,SANDKÜHLER R,et al.DiffusionModels for Medical Anomaly Detection[J].arXiv:2203.04306,2022. [40]SANCHEZ P,KASCENAS A,LIU X,el al.What is Healthy? Generative Counterfactual Diffusion for Lesion Localization[C]//Deep Generative Models.Cham:Springer Nature Switzerland.2022:34-44. [41]WYATT J,LEACH A,SCHMON S M,et al.AnoDDPM:Anomaly Detection with Denoising Diffusion Probabilistic Mo-dels using Simplex Noise[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPRW).2022:649-655. [42]SONG Y,SHEN L,XING L,et al.Solving Inverse Problems in Medical Imaging with Score-Based Generative Models[J].ar-Xiv:2111.08005,2021. [43]CHUNG H,YE J C.Score-based diffusion models for accele-rated MRI[J].arXiv:2110.05243,2022. [44]WU K E,YANG K K,BERG R VAN DEN,et al.Protein structure generation via folding diffusion[J].arXiv:2209.15611,2022. [45]LEE J S,KIM P M.ProteinSGM:Score-based generative mode-ling for de novo protein design[J].Nature Computational Science,2023,3(5):382-392. [46]ANAND N,ACHIM T.Protein Structure and Sequence Generation with Equivariant Denoising Diffusion Probabilistic Models[J].arXiv:2205.15019,2022. [47]CROITORU F A,HONDRU V,IONESCU R T,et al.Diffusion Models in Vision:A Survey[J].arXiv:2209.04747,2022. [48]YANG L,ZHANG Z,SONG Y,et al.Diffusion Models:A Comprehensive Survey ofMethods and Applications[J].arXiv:2209.00796,2022. [49]CAO H,TAN C,GAO Z,et al.A Survey on Generative Diffusion Model[J].arXiv:2209.02646,2022. [50]LAM M W Y,LAM M W Y,WANG J,et al.Bilateral Denoising Diffusion Models.[J].arXiv:2108.11514,2021. [51]GIANNONE G,NIELSEN D,WINTHER O.Few-Shot Diffusion Models[J].arXiv:2205.15463,2022. [52]SONG Y,ERMON S.Improved Techniques for Training Score-Based Generative Models[C]//Advances in Neural Information Processing Systems.2020:12438-12448. [53]SONG Y,DURKAN C,MURRAY I,et al.Maximum Likelihood Training of Score-Based Diffusion Models[C]//Advances in Neural Information Processing Systems.2021:1415-1428. [54]SONG Y,ERMON S.Generative Modeling by Estimating Gra-dients ofthe Data Distribution[C]//Advances in Neural Information Processing Systems.2019. [55]SONG Y,SOHL-DICKSTEIN J,KINGMA D P,et al.Score-Based Generative Modeling through Stochastic Differential Equations[J].arXiv:2011.13456,2021. [56]WILSON J T,MORICONI R,HUTTER F,et al.The reparameterization trick for acquisition functions[J].arXiv:1712.00424,2017. [57]SOHL-DICKSTEIN J,WEISS E,MAHESWARANATHANN,et al.Deep Unsupervised Learning using Nonequilibrium Thermodynamics[C]//Proceedings of the 32nd International Conference on Machine Learning.PMLR.2015:2256-2265. [58]RONNEBERGER O,FISCHER P,BROX T.U-Net:Convolu-tional Networks for Biomedical Image Segmentation[J].arXiv:1505.04597,2015. [59]WELLING M,TEH Y W.Bayesian Learning via StochasticGradient Langevin Dynamics[C]//Proceedings of the 28th International Conference on Machine Learning(ICML-11).2011:681-688. [60]VINCENT P.A Connection Between Score Matching and De-noising Autoencoders[J].Neural Computation,2011,23(7):1661-1674. [61]SONG Y,GARG S,SHI J,et al.Sliced Score Matching:A Scalable Approach to Density and Score Estimation[C]//Proceedings of The 35th Uncertainty in Artificial Intelligence Conference.2020:574-584. [62]NICHOL A Q,DHARIWAL P.Improved Denoising Diffusion Probabilistic Models[C]//Proceedings of the 38th International Conference on Machine Learning.2021:8162-8171. [63]KINGMA D,SALIMANS T,POOLE B,et al.Variational Diffusion Models[C]//Advances in Neural Information Processing Systems.2021:21696-21707. [64]SAN-ROMAN R,NACHMANI E,WOLF L.Noise Estimation for Generative Diffusion Models[J].arXiv:2104.02600,2021. [65]WANG J,LYU Z,LIN D,et al.Guided Diffusion Model for Adversarial Purification[J].arXiv:2205.14969,2022. [66]SONG J,MENG C,ERMON S.Denoising Diffusion ImplicitModels[J].arXiv:2010.02502,2020. [67]ZHANG Q,TAO M,CHEN Y.gDDIM:Generalized denoising diffusion implicit models[J].arXiv:2206.05564,2022. [68]SINHA A,SONG J,MENG C,et al.D2C:Diffusion-Denoising Models for Few-shot Conditional Generation.[J].arXiv:2106.06819,2021. [69]PEEBLES W,XIE S.Scalable Diffusion Models with Trans-formers[J].arXiv:2212.09748,2022. [70]SEHWAG V,HAZIRBAS C,GORDO A,et al.Generating High Fidelity Data From Low-Density Regions Using Diffusion Mo-dels[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2022:11492-11501. [71]KIM G,JANG W,LEE G,et al.DAG:Depth-Aware Guidance with Denoising Diffusion Probabilistic Models[J].arXiv:2212.08861,2023. [72]AUSTIN J,JOHNSON D D,HO J,et al.Structured Denoising Diffusion Models in Discrete State-Spaces[C]//Advances in Neural Information Processing Systems.2021:17981-17993. [73]WATSON D,CHAN W,HO J,et al.Learning Fast Samplers for Diffusion Models by Differentiating Through Sample Quality[J].arXiv:2202.05830,2022. [74]WATSON D,HO J,NOROUZI M,et al.Learning to efficiently sample from diffusion probabilistic models[J].arXiv:2106.03802,2021. [75]XIAO Z,KREIS K,VAHDAT A.Tackling the generative lear-ning trilemma with denoising diffusion GANs[J].arXiv:2112.07804,2021. [76]BAO F,LI C,SUN J,et al.Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models[J].ar-Xiv:2212.08861,2022. [77]CHUNG H,SIM B,YE J C.Come-Closer-Diffuse-Faster:Acce-lerating Conditional Diffusion Models for Inverse Problems Through Stochastic Contraction[C]//2022 IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition(CVPR).2022:12413-12422. [78]VAHDAT A,KREIS K,KAUTZ J.Score-based GenerativeModeling in Latent Space[C]//Advances in Neural Information Processing Systems.2021:11287-11302. [79]ZHANG L,ZHU X,FENG J.Accelerating Score-based Generative Models with Preconditioned Diffusion Sampling[J].arXiv:2207.02196,2022. [80]DU W,YANG T,ZHANG H,et al.A Flexible Diffusion Model[J].arXiv:2206.10365,2022. [81]HO J,SALIMANS T.Classifier-Free Diffusion Guidance[J].arXiv:2207.12598,2022. [82]ZHANG Q,CHEN Y.Diffusion Normalizing Flow[J].arXiv:2110.07579,2021. [83]KIM D,NA B,KWON S J,et al.Maximum Likelihood Training of Implicit Nonlinear Diffusion Models[J].arXiv:2205.13699,2022. [84]DOCKHORN T,VAHDAT A,KREIS K.Score-Based Generative Modeling with Critically-Damped Langevin Diffusion[J].arXiv:2112.07068,2022. [85]BORTOLI V D,THORNTON J,HENG J,et al.DiffusionSchrödinger Bridge with Applications to Score-Based Generative Modeling[J].arXiv:2106.01357,2021. [86]LIU L,REN Y,LIN Z,et al.Pseudo Numerical Methods for Dif-fusion Models on Manifolds[J].arXiv:2202.09778,2022. [87]JOLICOEUR-MARTINEAU A,LI K.Gotta Go Fast WhenGenerating Data with Score-Based Models[J].arXiv:2105.14080,2021. [88]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll you Need[C]//Advances in Neural Information Processing Systems.2017. [89]WANG Z,ZHENG H,HE P,et al.Diffusion-GAN:TrainingGANs with Diffusion[J].arXiv:2206.02262,2022. [90]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.AnImage is Worth 16x16 Words:Transformers for Image Recognition at Scale[J].arXiv:2010.11929,2021. [91]DEASY J,SIMIDJIEVSKI N,LIÒ P.Heavy-tailed denoisingscore matching[J].arXiv:2112.09788,2022. [92]KARRAS T,AITTALA M,AILA T,et al.Elucidating the Design Space of Diffusion-Based Generative Models[J].arXiv:2206.00364,2022. [93]LI H,YIFAN Y,CHANG M,et al.SRDiff:Single Image Super-Resolution with Diffusion Probabilistic Models[J].arXiv:2104.14951,2021. [94]HO J,SAHARIA C,CHAN W,et al.Cascaded Diffusion Models for High Fidelity Image Generation[J].arXiv:2106.15282,2021. [95]SASAKI H,WILLCOCKS C G,BRECKON T P.UNIT-DDPM:UNpaired Image Translation with Denoising Diffusion Probabilistic Models[J].arXiv:2104.05358,2021. [96]WANG W,BAO J,ZHOU W,et al.Semantic Image Synthesis via Diffusion Models[J].arXiv:2207.00050,2022. [97]LEWIS M,LIU Y,GOYAL N,et al.BART:Denoising Se-quence-to-Sequence Pre-training for Natural Language Generation,Translation,and Comprehension[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:7871-7880. [98]ZHOU L Q,DU Y L,WU J J,et al.3D Shape Generation and Completion through Point-Voxel Diffusion[J].arXiv::2104.03670,2021. [99]LUO S T,HU W.Diffusion Probabilistic Models for 3D Point Cloud Generation[J].arXiv:2103.01458,2021. [100]LEE J,IM W,LEE S,et al.Diffusion Probabilistic Models for Scene-Scale 3D Categorical Data[J].arXiv:2301.00527,2023. [101]DEVLIN J,CHANG MW,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Minneapolis,Minnesota:Association for Computational Linguistics,2019:4171-4186. [102]GU A,GOEL K,RÉ C.Efficiently Modeling Long Sequences with Structured State Spaces[J].arXiv:2111.00396,2022. [103]SCHMIDT R M.Recurrent Neural Networks(RNNs):A gentle Introduction and Overview[J].arXiv:1912.05911,2019. [104]RADFORD A,KIM J W,HALLACY C,et al.Learning Transferable Visual Models From Natural Language Supervision[J].arXiv:2103.00020,2021. [105]MOLAD E,HORWITZ E,VALEVSKI D,et al.Dreamix:Vi-deo Diffusion Models are General Video Editors [J].arXiv:2302.01329,2023. [106]WU J Z,GE Y,WANG X,et al.Tune-A-Video:One-Shot Tu-ning of Image Diffusion Models for Text-to-Video Generation [J].arXiv:2212.11565,2022. [107]FRANZESE G,ROSSI S,YANG L,et al.How Much isEnough? A Study onDiffusion Times in Score-based Generative Models[J].arXiv:2206.05173,2022. |
|