计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 240500094-9.doi: 10.11896/jsjkx.240500094
侯哲晓, 李弼程, 蔡炳炎, 许逸飞
HOU Zhexiao, LI Bicheng, CAI Bingyan, XU Yifei
摘要: 图像生成是AI2.0时代下AIGC的研究重点,而生成模型的更新迭代促进了图像生成技术的发展。目前主流生成模型的样本质量较低,无法满足AIGC对于图像高保真度的要求,而新兴的扩散模型在无条件生成中不能实现高质量生成。因此,提出了一种基于改进扩散模型的高质量图像生成方法。首先,采用训练稳定、具有优秀采样质量的扩散模型作为基准模型;其次,使用扩散模型中的自注意力机制来进一步引导噪声生成,进而还原图像中的低频内容,增强去噪过程的稳定性;最后,将递归特征金字塔融合到噪声预测器结构中,使图像特征信息反复提纯,从而捕获图像中的高频细节。在3个标准数据集和4个小型数据集上进行的对比实验和消融实验结果表明,该方法展现了比其他方法更为优秀的性能。
中图分类号:
[1]KENTON J D M W C,TOUTANOVA L K.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of NAACL-HLT.2019:4171-4186. [2]REZENDE D,MOHAMED S.Variational inference with nor-malizing flows[C]//International Conference on Machine Learning.PMLR,2015:1530-1538. [3]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J].Advances in neural information processing systems,2017,30:5998-6008. [4]KINGMA D P,WELLING M.Auto-encoding variational bayes[J].arXiv:1312.6114,2013. [5]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[J].Advances in Neural Information Processing Systems,2014,27:2672-2680. [6]HO J,JAIN A,ABBEEL P.Denoising diffusion probabilisticmodels[J].Advances in Neural Information Processing Systems,2020,33:6840-6851. [7]QIAO S,CHEN L C,YUILLE A.Detectors:Detecting objects with recursive feature pyramid and switchable atrous convolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:10213-10224. [8]VAN DEN OORD A,VINYALS O.Neural discrete representation learning[J].Advances in Neural Information Processing Systems,2017,30:6306-6315. [9]RAZAVI A,VAN DEN OORD A,VINYALS O.Generating diverse high-fidelity images with vq-vae-2[J].Advances in Neural Information Processing Systems,2019,32:14866-14876. [10]CHILD R.Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images[C]//International Conference on Learning Representations.2020. [11]HAZAMI L,MAMA R,THURAIRATNAM R.Efficientvdvae:Less is more[J].arXiv:2203.13751,2022. [12]KINGMA D P,DHARIWAL P.Glow:Generative flow with invertible 1x1 convolutions[J].Advances in Neural Information Processing Systems,2018,31:10215-10224. [13]HOOGEBOOM E,VAN DEN BERG R,WELLING M.Emer-ging convolutions for generative normalizing flows[C]//International Conference on Machine Learning.PMLR,2019:2771-2780. [14]XU Y,LIU Z,TEGMARK M,et al.Poisson flow generativemodels[J].Advances in Neural Information Processing Systems,2022,35:16782-16795. [15]RADFORD A,METZ L,CHINTALA S.Unsupervised representation learning with deep convolutional generative adversarial networks[C]//International Conference on Learning Representations(ICLR).2016. [16]KARRAS T,AILA T,LAINE S,et al.Progressive Growing of GANs for Improved Quality,Stability,and Variation[C]//International Conference on Learning Representations.2018. [17]KARRAS T,LAINE S,AILA T.A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4401-4410. [18]KARRAS T,LAINE S,AITTALA M,et al.Analyzing and improving the image quality of stylegan[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:8110-8119. [19]KARRAS T,AITTALA M,LAINE S,et al.Alias-free generative adversarial networks[J].Advances in Neural Information Processing Systems,2021,34:852-863. [20]NICHOL A Q,DHARIWAL P.Improved denoising diffusionprobabilistic models[C]//International Conference on Machine Learning.PMLR,2021:8162-8171. [21]DHARIWAL P,NICHOL A.Diffusion models beat gans on image synthesis[J].Advances in Neural Information Processing Systems,2021,34:8780-8794. [22]WANG Z,ZHENG H,HE P,et al.Diffusion-GAN:TrainingGANs with Diffusion[C]//The Eleventh International Conference on Learning Representations.2022. [23]RONNEBERGER O,FISCHER P,BROX T.U-net:Convolu-tional networks for biomedical image segmentation[C]//Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015:18th International Conference,Munich,Germany,October 5-9,2015,Proceedings,Part III 18.Springer International Publishing,2015:234-241. [24]HO J,SALIMANS T.Classifier-Free Diffusion Guidance[C]//NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications.2021. [25]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2117-2125. [26]GIRSHICK R.Fast r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1440-1448. [27]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.DeepLab:Semantic Image Segmentation with Deep Convolutional Nets,Atrous Convolution,and Fully Connected CRFs[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2018,40(4):834-848. [28]CHOI Y,UH Y,YOO J,et al.Stargan v2:Diverse image synthesis for multiple domains[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:8188-8197. [29]KARRAS T,AITTALA M,HELLSTEN J,et al.Training generative adversarial networks with limited data[J].Advances in Neural Information Processing Systems,2020,33:12104-12114. [30]HEUSEL M,RAMSAUER H,UNTERTHINERT,et al.Gans trained by a two time-scale update rule converge to a local nash equilibrium[J].Advances in Neural Information Processing Systems,2017,30:6626-6637. [31]MIKOŁAJ B,SUTHERLAND DOUGAL J,MICHAEL A,et al.Demystifying mmd gans[C]//International Conference on Learning Representations.2018. [32]WANG X,DINH A D,LIU D,et al.Boosting diffusion models with an adaptive momentum sampler[J].arXiv:2308.11941,2023. [33]YU H,SHEN L,HUANG J,et al.Debias the Training of Diffusion Models[J].arXiv:2310.08442,2023. [34]GU J,ZHAI S,ZHANG Y,et al.f-DM:A Multi-stage Diffusion Model via Progressive Signal Transformation[C]//The Eleventh International Conference on Learning Representations.2022. [35]PANG B,HAN T,NIJKAMP E,et al.Learning latent space energy-based prior model[J].Advances in Neural Information Processing Systems,2020,33:21994-22008. [36]ESSER P,ROMBACH R,OMMER B.Taming transformers for high-resolution image synthesis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:12873-12883. [37]KIM J,CHOI Y,UH Y.Feature statistics mixing regularization for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11294-11303. [38]VAHDAT A,KREIS K,KAUTZ J.Score-based generativemodeling in latent space[J].Advances in Neural Information Processing Systems,2021,34:11287-11302. [39]CUI J,WU Y N,HAN T.Learning Joint Latent Space EBMPrior Model for Multi-layer Generator[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:3603-3612. [40]CUI J,HAN T.Learning Energy-based Model via Dual-MCMC Teaching[J].Advances in Neural Information Processing Systems,2023,36:28861-28872. [41]PANDEY K,MUKHERJEE A,RAI P,et al.DiffuseVAE:Efficient,Controllable and High-Fidelity Generation from Low-Dimensional Latents[J].arXiv:2201.00308,2022. [42]PARMAR G,LI D,LEE K,et al.Dual contradistinctive generative autoencoder[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:823-832. [43]PIDHORSKYI S,ADJEROH D A,DORETTO G.Adversarial latent autoencoders[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:14104-14113. [44]SONG Y,SOHL-DICKSTEIN J,KINGMAD P,et al.Score-Based Generative Modeling through Stochastic Differential Equations[C]//International Conference on Learning Representations.2021. [45]PARK D,KIM S,LEE S,et al.DDMI:Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations[C]//International Conference on Learning Representations.2024. [46]VERINE A,NEGREVERGNE B,PYDI M S,et al.Precision-Recall Divergence Optimization for Generative Modeling with GANs and Normalizing Flows[J].arXiv:2305.18910,2024. [47]XIANG J,YANG J,DENG Y,et al.Gram-hd:3d-consistent image generation at high resolution with generative radiance manifolds[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:2195-2205. [48]ESSER P,ROMBACH R,BLATTMANN A,et al.Imagebart:Bidirectional context with multinomial diffusion for autoregressive image synthesis[J].Advances in Neural Information Processing Systems,2021,34:3518-3532. [49]ZHAO X,MA F,GÜERA D,et al.Generative multiplane images:Making a 2d gan 3d-aware[C]//European Conference on Computer Vision.Cham:Springer Nature Switzerland,2022:18-35. [50]SINHA A,SONG J,MENG C,et al.D2c:Diffusion-decodingmodels for few-shot conditional generation[J].Advances in Neural Information Processing Systems,2021,34:12533-12548. [51]OR-EL R,LUO X,SHAN M,et al.Stylesdf:High-resolution3d-consistent image and geometry generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:13503-13513. [52]GU J,LIU L,WANG P,et al.StyleNeRF:A Style-based 3D Aware Generator for High-resolution Image Synthesis[C]//International Conference on Learning Representations.2021. [53]HO J,SAHARIA C,CHAN W,et al.Cascaded diffusion models for high fidelity image generation[J].The Journal of Machine Learning Research,2022,23(1):2249-2281. [54]KUMARI N,ZHANG R,SHECHTMAN E,et al.Ensembling off-the-shelf models for gan training[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:10651-10662. [55]CHAN E R,LIN C Z,CHAN M A,et al.Efficient geometry-aware 3D generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:16123-16133. [56]ZHANG W,LIU H,LI B,et al.Dynamically Masked Discriminator for Generative Adversarial Networks[J].arXiv:2306.07716,2023. [57]LIU S,YE J,REN S,et al.Dynast:Dynamic sparse transformer for exemplar-guided image generation[C]//European Conference on Computer Vision.Cham:Springer Nature Switzerland,2022:72-90. |
|