Computer Science ›› 2025, Vol. 52 ›› Issue (6A): 240500094-9.doi: 10.11896/jsjkx.240500094

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

High Quality Image Generation Method Based on Improved Diffusion Model

HOU Zhexiao, LI Bicheng, CAI Bingyan, XU Yifei   

  1. College of Computer Science and Technology,Huaqiao University,Xiamen,Fujian 361021,China
  • Online:2025-06-16 Published:2025-06-12
  • About author:HOU Zhexiao,born in 1999,postgra-duate,His main research interests include AIGC and AIGC detection.
    LI Bicheng,born in 1970,Ph.D,professor,Ph.D supervisor.His main research interests include intelligent information processing,network ideological security,network public opinion monitoring and guidance,and big data analysis and mining.
  • Supported by:
    Joint Fund of Equipment Pre-research and Ministry of Education(8091B022150) and Xiamen Major Science and Technology Project(20220404).

Abstract: Image generation is the research focus of AIGC in AI2.0 era,and the iteration of generation model promotes the deve-lopment of image generation technology.At present,the sample quality of mainstream generation models is low,which can not meet the high fidelity requirements of AIGC for images,and the emerging diffusion model cannot achieve high quality generation in unconditional generation.Therefore,this paper proposes a high quality image generation method based on improved diffusion model.Firstly,the diffusion model with stable training and excellent sampling quality is used as the benchmark model.Secondly,the self-attention mechanism in the diffusion model is used to guide the noise generation,so as to restore the low-frequency content in the image and enhance the stability of the denoising process.Finally,the recursive feature pyramid is integrated into the noise predictor structure,and the image feature information is repeatedly purified to capture the rich high-frequency details in the image.Comparison experiments and ablation experiments are performed on three standard datasets and four small datasets.The results show that the proposed method exhibits better performance than other mothods.

Key words: Image generation, Diffusion model, Self-attention mechanism guide, Recursive feature pyramid

CLC Number: 

  • TP391
[1]KENTON J D M W C,TOUTANOVA L K.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of NAACL-HLT.2019:4171-4186.
[2]REZENDE D,MOHAMED S.Variational inference with nor-malizing flows[C]//International Conference on Machine Learning.PMLR,2015:1530-1538.
[3]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J].Advances in neural information processing systems,2017,30:5998-6008.
[4]KINGMA D P,WELLING M.Auto-encoding variational bayes[J].arXiv:1312.6114,2013.
[5]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[J].Advances in Neural Information Processing Systems,2014,27:2672-2680.
[6]HO J,JAIN A,ABBEEL P.Denoising diffusion probabilisticmodels[J].Advances in Neural Information Processing Systems,2020,33:6840-6851.
[7]QIAO S,CHEN L C,YUILLE A.Detectors:Detecting objects with recursive feature pyramid and switchable atrous convolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:10213-10224.
[8]VAN DEN OORD A,VINYALS O.Neural discrete representation learning[J].Advances in Neural Information Processing Systems,2017,30:6306-6315.
[9]RAZAVI A,VAN DEN OORD A,VINYALS O.Generating diverse high-fidelity images with vq-vae-2[J].Advances in Neural Information Processing Systems,2019,32:14866-14876.
[10]CHILD R.Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images[C]//International Conference on Learning Representations.2020.
[11]HAZAMI L,MAMA R,THURAIRATNAM R.Efficientvdvae:Less is more[J].arXiv:2203.13751,2022.
[12]KINGMA D P,DHARIWAL P.Glow:Generative flow with invertible 1x1 convolutions[J].Advances in Neural Information Processing Systems,2018,31:10215-10224.
[13]HOOGEBOOM E,VAN DEN BERG R,WELLING M.Emer-ging convolutions for generative normalizing flows[C]//International Conference on Machine Learning.PMLR,2019:2771-2780.
[14]XU Y,LIU Z,TEGMARK M,et al.Poisson flow generativemodels[J].Advances in Neural Information Processing Systems,2022,35:16782-16795.
[15]RADFORD A,METZ L,CHINTALA S.Unsupervised representation learning with deep convolutional generative adversarial networks[C]//International Conference on Learning Representations(ICLR).2016.
[16]KARRAS T,AILA T,LAINE S,et al.Progressive Growing of GANs for Improved Quality,Stability,and Variation[C]//International Conference on Learning Representations.2018.
[17]KARRAS T,LAINE S,AILA T.A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4401-4410.
[18]KARRAS T,LAINE S,AITTALA M,et al.Analyzing and improving the image quality of stylegan[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:8110-8119.
[19]KARRAS T,AITTALA M,LAINE S,et al.Alias-free generative adversarial networks[J].Advances in Neural Information Processing Systems,2021,34:852-863.
[20]NICHOL A Q,DHARIWAL P.Improved denoising diffusionprobabilistic models[C]//International Conference on Machine Learning.PMLR,2021:8162-8171.
[21]DHARIWAL P,NICHOL A.Diffusion models beat gans on image synthesis[J].Advances in Neural Information Processing Systems,2021,34:8780-8794.
[22]WANG Z,ZHENG H,HE P,et al.Diffusion-GAN:TrainingGANs with Diffusion[C]//The Eleventh International Conference on Learning Representations.2022.
[23]RONNEBERGER O,FISCHER P,BROX T.U-net:Convolu-tional networks for biomedical image segmentation[C]//Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015:18th International Conference,Munich,Germany,October 5-9,2015,Proceedings,Part III 18.Springer International Publishing,2015:234-241.
[24]HO J,SALIMANS T.Classifier-Free Diffusion Guidance[C]//NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications.2021.
[25]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2117-2125.
[26]GIRSHICK R.Fast r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1440-1448.
[27]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.DeepLab:Semantic Image Segmentation with Deep Convolutional Nets,Atrous Convolution,and Fully Connected CRFs[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2018,40(4):834-848.
[28]CHOI Y,UH Y,YOO J,et al.Stargan v2:Diverse image synthesis for multiple domains[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:8188-8197.
[29]KARRAS T,AITTALA M,HELLSTEN J,et al.Training generative adversarial networks with limited data[J].Advances in Neural Information Processing Systems,2020,33:12104-12114.
[30]HEUSEL M,RAMSAUER H,UNTERTHINERT,et al.Gans trained by a two time-scale update rule converge to a local nash equilibrium[J].Advances in Neural Information Processing Systems,2017,30:6626-6637.
[31]MIKOŁAJ B,SUTHERLAND DOUGAL J,MICHAEL A,et al.Demystifying mmd gans[C]//International Conference on Learning Representations.2018.
[32]WANG X,DINH A D,LIU D,et al.Boosting diffusion models with an adaptive momentum sampler[J].arXiv:2308.11941,2023.
[33]YU H,SHEN L,HUANG J,et al.Debias the Training of Diffusion Models[J].arXiv:2310.08442,2023.
[34]GU J,ZHAI S,ZHANG Y,et al.f-DM:A Multi-stage Diffusion Model via Progressive Signal Transformation[C]//The Eleventh International Conference on Learning Representations.2022.
[35]PANG B,HAN T,NIJKAMP E,et al.Learning latent space energy-based prior model[J].Advances in Neural Information Processing Systems,2020,33:21994-22008.
[36]ESSER P,ROMBACH R,OMMER B.Taming transformers for high-resolution image synthesis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:12873-12883.
[37]KIM J,CHOI Y,UH Y.Feature statistics mixing regularization for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11294-11303.
[38]VAHDAT A,KREIS K,KAUTZ J.Score-based generativemodeling in latent space[J].Advances in Neural Information Processing Systems,2021,34:11287-11302.
[39]CUI J,WU Y N,HAN T.Learning Joint Latent Space EBMPrior Model for Multi-layer Generator[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:3603-3612.
[40]CUI J,HAN T.Learning Energy-based Model via Dual-MCMC Teaching[J].Advances in Neural Information Processing Systems,2023,36:28861-28872.
[41]PANDEY K,MUKHERJEE A,RAI P,et al.DiffuseVAE:Efficient,Controllable and High-Fidelity Generation from Low-Dimensional Latents[J].arXiv:2201.00308,2022.
[42]PARMAR G,LI D,LEE K,et al.Dual contradistinctive generative autoencoder[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:823-832.
[43]PIDHORSKYI S,ADJEROH D A,DORETTO G.Adversarial latent autoencoders[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:14104-14113.
[44]SONG Y,SOHL-DICKSTEIN J,KINGMAD P,et al.Score-Based Generative Modeling through Stochastic Differential Equations[C]//International Conference on Learning Representations.2021.
[45]PARK D,KIM S,LEE S,et al.DDMI:Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations[C]//International Conference on Learning Representations.2024.
[46]VERINE A,NEGREVERGNE B,PYDI M S,et al.Precision-Recall Divergence Optimization for Generative Modeling with GANs and Normalizing Flows[J].arXiv:2305.18910,2024.
[47]XIANG J,YANG J,DENG Y,et al.Gram-hd:3d-consistent image generation at high resolution with generative radiance manifolds[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:2195-2205.
[48]ESSER P,ROMBACH R,BLATTMANN A,et al.Imagebart:Bidirectional context with multinomial diffusion for autoregressive image synthesis[J].Advances in Neural Information Processing Systems,2021,34:3518-3532.
[49]ZHAO X,MA F,GÜERA D,et al.Generative multiplane images:Making a 2d gan 3d-aware[C]//European Conference on Computer Vision.Cham:Springer Nature Switzerland,2022:18-35.
[50]SINHA A,SONG J,MENG C,et al.D2c:Diffusion-decodingmodels for few-shot conditional generation[J].Advances in Neural Information Processing Systems,2021,34:12533-12548.
[51]OR-EL R,LUO X,SHAN M,et al.Stylesdf:High-resolution3d-consistent image and geometry generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:13503-13513.
[52]GU J,LIU L,WANG P,et al.StyleNeRF:A Style-based 3D Aware Generator for High-resolution Image Synthesis[C]//International Conference on Learning Representations.2021.
[53]HO J,SAHARIA C,CHAN W,et al.Cascaded diffusion models for high fidelity image generation[J].The Journal of Machine Learning Research,2022,23(1):2249-2281.
[54]KUMARI N,ZHANG R,SHECHTMAN E,et al.Ensembling off-the-shelf models for gan training[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:10651-10662.
[55]CHAN E R,LIN C Z,CHAN M A,et al.Efficient geometry-aware 3D generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:16123-16133.
[56]ZHANG W,LIU H,LI B,et al.Dynamically Masked Discriminator for Generative Adversarial Networks[J].arXiv:2306.07716,2023.
[57]LIU S,YE J,REN S,et al.Dynast:Dynamic sparse transformer for exemplar-guided image generation[C]//European Conference on Computer Vision.Cham:Springer Nature Switzerland,2022:72-90.
[1] ZOU Rui, YANG Jian, ZHANG Kai. Low-resource Vietnamese Speech Synthesis Based on Phoneme Large Language Model andDiffusion Model [J]. Computer Science, 2025, 52(6A): 240700138-6.
[2] ZHANG Yaolin, LIU Xiaonan, DU Shuaiqi, LIAN Demeng. Hybrid Quantum-classical Compressed Generative Adversarial Networks Based on Matrix Product Operators [J]. Computer Science, 2025, 52(6): 74-81.
[3] GENG Sheng, DING Weiping, JU Hengrong, HUANG Jiashuang, JIANG Shu, WANG Haipeng. FDiff-Fusion:Medical Image Diffusion Fusion Network Segmentation Model Driven Based onFuzzy Logic [J]. Computer Science, 2025, 52(6): 274-285.
[4] KANG Kai, WANG Jiabao, XU Kun. Balancing Transferability and Imperceptibility for Adversarial Attacks [J]. Computer Science, 2025, 52(6): 381-389.
[5] HUANG Feihu, LI Peidong, PENG Jian, DONG Shilei, ZHAO Honglei, SONG Weiping, LI Qiang. Multi-agent Based Bidding Strategy Model Considering Wind Power [J]. Computer Science, 2024, 51(6A): 230600179-8.
[6] GE Yinchi, ZHANG Hui, SUN Haohang. Differential Privacy Data Synthesis Method Based on Latent Diffusion Model [J]. Computer Science, 2024, 51(3): 30-38.
[7] YAN Zhihao, ZHOU Zhangbing, LI Xiaocui. Survey on Generative Diffusion Model [J]. Computer Science, 2024, 51(1): 273-283.
[8] SONG Xinyang, YAN Zhiyuan, SUN Muyi, DAI Linlin, LI Qi, SUN Zhenan. Review of Talking Face Generation [J]. Computer Science, 2023, 50(8): 68-78.
[9] CHEN Wanze, CHEN Jiazhen, HUANG Liqing, YE Feng, HUANG Tianqiang, LUO Haifeng. Controlled Facial Gender Forgery Combining Wavelet Transform High Frequency Information [J]. Computer Science, 2023, 50(11A): 221000241-10.
[10] SHI Da, LU Tian-liang, DU Yan-hui, ZHANG Jian-ling, BAO Yu-xuan. Generation Model of Gender-forged Face Image Based on Improved CycleGAN [J]. Computer Science, 2022, 49(2): 31-39.
[11] TAN Xin-yue, HE Xiao-hai, WANG Zheng-yong, LUO Xiao-dong, QING Lin-bo. Text-to-Image Generation Technology Based on Transformer Cross Attention [J]. Computer Science, 2022, 49(2): 107-115.
[12] ZHANG Yang, MA Xiao-hu. Anime Character Portrait Generation Algorithm Based on Improved Generative Adversarial Networks [J]. Computer Science, 2021, 48(1): 182-189.
[13] XU Yong-shi, BEN Ke-rong, WANG Tian-yu, LIU Si-jie. Study on DCGAN Model Improvement and SAR Images Generation [J]. Computer Science, 2020, 47(12): 93-99.
[14] ZHENG Hong-bo, WU Bin, XU Fei, ZHANG Mei-yu, QIN Xu-jia. Visualization of Solid Waste Incineration Exhaust Emissions Based on Gaussian Diffusion Model [J]. Computer Science, 2019, 46(6A): 527-531.
[15] XU Qiang, ZHONG Shang-ping, CHEN Kai-zhi, ZHANG Chun-yang. Optimized Selection Method of Cycle-consistent Loss Coefficient of CycleGAN in Image Generation with Different Texture Complexity [J]. Computer Science, 2019, 46(1): 100-106.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!