基于改进扩散模型的高质量图像生成方法

doi:10.11896/jsjkx.240500094

摘要/Abstract

摘要： 图像生成是AI2.0时代下AIGC的研究重点,而生成模型的更新迭代促进了图像生成技术的发展。目前主流生成模型的样本质量较低,无法满足AIGC对于图像高保真度的要求,而新兴的扩散模型在无条件生成中不能实现高质量生成。因此,提出了一种基于改进扩散模型的高质量图像生成方法。首先,采用训练稳定、具有优秀采样质量的扩散模型作为基准模型;其次,使用扩散模型中的自注意力机制来进一步引导噪声生成,进而还原图像中的低频内容,增强去噪过程的稳定性;最后,将递归特征金字塔融合到噪声预测器结构中,使图像特征信息反复提纯,从而捕获图像中的高频细节。在3个标准数据集和4个小型数据集上进行的对比实验和消融实验结果表明,该方法展现了比其他方法更为优秀的性能。

关键词: 图像生成, 扩散模型, 自注意力机制引导, 递归特征金字塔

Abstract: Image generation is the research focus of AIGC in AI2.0 era,and the iteration of generation model promotes the deve-lopment of image generation technology.At present,the sample quality of mainstream generation models is low,which can not meet the high fidelity requirements of AIGC for images,and the emerging diffusion model cannot achieve high quality generation in unconditional generation.Therefore,this paper proposes a high quality image generation method based on improved diffusion model.Firstly,the diffusion model with stable training and excellent sampling quality is used as the benchmark model.Secondly,the self-attention mechanism in the diffusion model is used to guide the noise generation,so as to restore the low-frequency content in the image and enhance the stability of the denoising process.Finally,the recursive feature pyramid is integrated into the noise predictor structure,and the image feature information is repeatedly purified to capture the rich high-frequency details in the image.Comparison experiments and ablation experiments are performed on three standard datasets and four small datasets.The results show that the proposed method exhibits better performance than other mothods.

Key words: Image generation, Diffusion model, Self-attention mechanism guide, Recursive feature pyramid

中图分类号:

TP391

侯哲晓, 李弼程, 蔡炳炎, 许逸飞. 基于改进扩散模型的高质量图像生成方法[J]. 计算机科学, 2025, 52(6A): 240500094-9. https://doi.org/10.11896/jsjkx.240500094

HOU Zhexiao, LI Bicheng, CAI Bingyan, XU Yifei. High Quality Image Generation Method Based on Improved Diffusion Model[J]. Computer Science, 2025, 52(6A): 240500094-9. https://doi.org/10.11896/jsjkx.240500094

参考文献

[1]KENTON J D M W C,TOUTANOVA L K.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of NAACL-HLT.2019:4171-4186.
[2]REZENDE D,MOHAMED S.Variational inference with nor-malizing flows[C]//International Conference on Machine Learning.PMLR,2015:1530-1538.
[3]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J].Advances in neural information processing systems,2017,30:5998-6008.
[4]KINGMA D P,WELLING M.Auto-encoding variational bayes[J].arXiv:1312.6114,2013.
[5]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[J].Advances in Neural Information Processing Systems,2014,27:2672-2680.
[6]HO J,JAIN A,ABBEEL P.Denoising diffusion probabilisticmodels[J].Advances in Neural Information Processing Systems,2020,33:6840-6851.
[7]QIAO S,CHEN L C,YUILLE A.Detectors:Detecting objects with recursive feature pyramid and switchable atrous convolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:10213-10224.
[8]VAN DEN OORD A,VINYALS O.Neural discrete representation learning[J].Advances in Neural Information Processing Systems,2017,30:6306-6315.
[9]RAZAVI A,VAN DEN OORD A,VINYALS O.Generating diverse high-fidelity images with vq-vae-2[J].Advances in Neural Information Processing Systems,2019,32:14866-14876.
[10]CHILD R.Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images[C]//International Conference on Learning Representations.2020.
[11]HAZAMI L,MAMA R,THURAIRATNAM R.Efficientvdvae:Less is more[J].arXiv:2203.13751,2022.
[12]KINGMA D P,DHARIWAL P.Glow:Generative flow with invertible 1x1 convolutions[J].Advances in Neural Information Processing Systems,2018,31:10215-10224.
[13]HOOGEBOOM E,VAN DEN BERG R,WELLING M.Emer-ging convolutions for generative normalizing flows[C]//International Conference on Machine Learning.PMLR,2019:2771-2780.
[14]XU Y,LIU Z,TEGMARK M,et al.Poisson flow generativemodels[J].Advances in Neural Information Processing Systems,2022,35:16782-16795.
[15]RADFORD A,METZ L,CHINTALA S.Unsupervised representation learning with deep convolutional generative adversarial networks[C]//International Conference on Learning Representations(ICLR).2016.
[16]KARRAS T,AILA T,LAINE S,et al.Progressive Growing of GANs for Improved Quality,Stability,and Variation[C]//International Conference on Learning Representations.2018.
[17]KARRAS T,LAINE S,AILA T.A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4401-4410.
[18]KARRAS T,LAINE S,AITTALA M,et al.Analyzing and improving the image quality of stylegan[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:8110-8119.
[19]KARRAS T,AITTALA M,LAINE S,et al.Alias-free generative adversarial networks[J].Advances in Neural Information Processing Systems,2021,34:852-863.
[20]NICHOL A Q,DHARIWAL P.Improved denoising diffusionprobabilistic models[C]//International Conference on Machine Learning.PMLR,2021:8162-8171.
[21]DHARIWAL P,NICHOL A.Diffusion models beat gans on image synthesis[J].Advances in Neural Information Processing Systems,2021,34:8780-8794.
[22]WANG Z,ZHENG H,HE P,et al.Diffusion-GAN:TrainingGANs with Diffusion[C]//The Eleventh International Conference on Learning Representations.2022.
[23]RONNEBERGER O,FISCHER P,BROX T.U-net:Convolu-tional networks for biomedical image segmentation[C]//Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015:18th International Conference,Munich,Germany,October 5-9,2015,Proceedings,Part III 18.Springer International Publishing,2015:234-241.
[24]HO J,SALIMANS T.Classifier-Free Diffusion Guidance[C]//NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications.2021.
[25]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2117-2125.
[26]GIRSHICK R.Fast r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1440-1448.
[27]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.DeepLab:Semantic Image Segmentation with Deep Convolutional Nets,Atrous Convolution,and Fully Connected CRFs[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2018,40(4):834-848.
[28]CHOI Y,UH Y,YOO J,et al.Stargan v2:Diverse image synthesis for multiple domains[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:8188-8197.
[29]KARRAS T,AITTALA M,HELLSTEN J,et al.Training generative adversarial networks with limited data[J].Advances in Neural Information Processing Systems,2020,33:12104-12114.
[30]HEUSEL M,RAMSAUER H,UNTERTHINERT,et al.Gans trained by a two time-scale update rule converge to a local nash equilibrium[J].Advances in Neural Information Processing Systems,2017,30:6626-6637.
[31]MIKOŁAJ B,SUTHERLAND DOUGAL J,MICHAEL A,et al.Demystifying mmd gans[C]//International Conference on Learning Representations.2018.
[32]WANG X,DINH A D,LIU D,et al.Boosting diffusion models with an adaptive momentum sampler[J].arXiv:2308.11941,2023.
[33]YU H,SHEN L,HUANG J,et al.Debias the Training of Diffusion Models[J].arXiv:2310.08442,2023.
[34]GU J,ZHAI S,ZHANG Y,et al.f-DM:A Multi-stage Diffusion Model via Progressive Signal Transformation[C]//The Eleventh International Conference on Learning Representations.2022.
[35]PANG B,HAN T,NIJKAMP E,et al.Learning latent space energy-based prior model[J].Advances in Neural Information Processing Systems,2020,33:21994-22008.
[36]ESSER P,ROMBACH R,OMMER B.Taming transformers for high-resolution image synthesis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:12873-12883.
[37]KIM J,CHOI Y,UH Y.Feature statistics mixing regularization for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11294-11303.
[38]VAHDAT A,KREIS K,KAUTZ J.Score-based generativemodeling in latent space[J].Advances in Neural Information Processing Systems,2021,34:11287-11302.
[39]CUI J,WU Y N,HAN T.Learning Joint Latent Space EBMPrior Model for Multi-layer Generator[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:3603-3612.
[40]CUI J,HAN T.Learning Energy-based Model via Dual-MCMC Teaching[J].Advances in Neural Information Processing Systems,2023,36:28861-28872.
[41]PANDEY K,MUKHERJEE A,RAI P,et al.DiffuseVAE:Efficient,Controllable and High-Fidelity Generation from Low-Dimensional Latents[J].arXiv:2201.00308,2022.
[42]PARMAR G,LI D,LEE K,et al.Dual contradistinctive generative autoencoder[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:823-832.
[43]PIDHORSKYI S,ADJEROH D A,DORETTO G.Adversarial latent autoencoders[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:14104-14113.
[44]SONG Y,SOHL-DICKSTEIN J,KINGMAD P,et al.Score-Based Generative Modeling through Stochastic Differential Equations[C]//International Conference on Learning Representations.2021.
[45]PARK D,KIM S,LEE S,et al.DDMI:Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations[C]//International Conference on Learning Representations.2024.
[46]VERINE A,NEGREVERGNE B,PYDI M S,et al.Precision-Recall Divergence Optimization for Generative Modeling with GANs and Normalizing Flows[J].arXiv:2305.18910,2024.
[47]XIANG J,YANG J,DENG Y,et al.Gram-hd:3d-consistent image generation at high resolution with generative radiance manifolds[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:2195-2205.
[48]ESSER P,ROMBACH R,BLATTMANN A,et al.Imagebart:Bidirectional context with multinomial diffusion for autoregressive image synthesis[J].Advances in Neural Information Processing Systems,2021,34:3518-3532.
[49]ZHAO X,MA F,GÜERA D,et al.Generative multiplane images:Making a 2d gan 3d-aware[C]//European Conference on Computer Vision.Cham:Springer Nature Switzerland,2022:18-35.
[50]SINHA A,SONG J,MENG C,et al.D2c:Diffusion-decodingmodels for few-shot conditional generation[J].Advances in Neural Information Processing Systems,2021,34:12533-12548.
[51]OR-EL R,LUO X,SHAN M,et al.Stylesdf:High-resolution3d-consistent image and geometry generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:13503-13513.
[52]GU J,LIU L,WANG P,et al.StyleNeRF:A Style-based 3D Aware Generator for High-resolution Image Synthesis[C]//International Conference on Learning Representations.2021.
[53]HO J,SAHARIA C,CHAN W,et al.Cascaded diffusion models for high fidelity image generation[J].The Journal of Machine Learning Research,2022,23(1):2249-2281.
[54]KUMARI N,ZHANG R,SHECHTMAN E,et al.Ensembling off-the-shelf models for gan training[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:10651-10662.
[55]CHAN E R,LIN C Z,CHAN M A,et al.Efficient geometry-aware 3D generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:16123-16133.
[56]ZHANG W,LIU H,LI B,et al.Dynamically Masked Discriminator for Generative Adversarial Networks[J].arXiv:2306.07716,2023.
[57]LIU S,YE J,REN S,et al.Dynast:Dynamic sparse transformer for exemplar-guided image generation[C]//European Conference on Computer Vision.Cham:Springer Nature Switzerland,2022:72-90.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed