计算机科学 ›› 2024, Vol. 51 ›› Issue (8): 224-231.doi: 10.11896/jsjkx.230500038
何知霖1,2, 顾天昊1,2, 徐冠华1
HE Zhilin1,2, GU Tianhao1,2, XU Guanhua1
摘要: 图像翻译任务是计算机视觉领域一个重要的研究方向,在图像风格化、超分辨率图像生成等视觉领域都有着广泛的应用。针对图像翻译任务中语义信息标注成本高、数据集通常标注困难的问题,提出了一种基于原型修正的小样本语义图像翻译算法,该算法主要包含StyleGAN、语义相似度回归器、pSp编码器模块。首先,为了降低模型对标签图像的依赖,该算法使用StyleGAN预训练模型充当生成器,增加小样本场景下的训练样本数和提升模型生成的多样性。其次,考虑到样本语义类内差异,该算法设计语义相似度回归器对原型进行修正,提升伪标签的准确率,增强模型优化效果。然后,结合标签图像和合成图像的特征图以及原型向量,实现语义信息的循环合成,构建出自监督损失函数以避免语义相似度回归器训练的标签信息需求,并利用伪标签图像对pSp编码器继续进行训练,实现语义图像翻译任务。最后,实验结果验证了所提算法在泛化性能和合成图像的多样性方面均优于经典算法。
中图分类号:
[1]LAWRENCE N,JORDAN M.Semi-supervised learning viaGaussian processes [J].Advances in Neural Information Processing Systems,2004,17:753-760. [2]WANG T C,LIU M Y,ZHU J Y,et al.High-resolution image synthesis and semantic manipulation with conditional gans[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2018:8798-8807. [3]PARK T,LIU M Y,WANG T C,et al.Semantic image synthesis with spatially-adaptive normalization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2019:2337-2346. [4]KARRAS T,LAINE S,AILA T.A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2019:4401-4410. [5]ENDO Y,KANAMORI Y.Few-shot semantic image synthesisusing stylegan prior [J].arXiv:2103.14877,2021. [6]OJHA U,LI Y,LU J,et al.Few-shot image generation viacross-domain correspondence[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2021:10743-10752. [7]ROBB E,CHU W S,KUMAR A,et al.Few-shot Adaptation of Generative Adversarial Net-works[J].arXiv:2010.11943,2020. [8]BAZAZIAN D,CALWAY A,DAMEN D.Dual-Domain ImageSynthesis using Segmenta-tion-Guided GAN[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2022:506-515. [9]LEE C H,LIU Z,WU L,et al.Maskgan:Towards diverse and interactive facial image manipulation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2020:5549-5558. [10]YU F,SEFF A,ZHANG Y,et al.Lsun:Construction of a large-scale image dataset using deep learning with humans in the loop [J].arXiv:1506.03365,2015. [11]GARCIA-GARCIA A,ORTS-ESCOLANO S,OPREA S,et al.A review on deep learning techniques applied to semantic segmentation [J].arXiv:1704.06857,2017. [12]HEUSEL M,RAMSAUER H,UNTERTHINER T,et al.Gans trained by a two time-scale update rule converge to a local nash equilibrium [J].Advances in Neural Information Processing Systems,2017,30:6626-6637. [13]LIU H,BROCK A,SIMONYAN K,et al.Evolving normalization-activation layers [J].Advances in Neural Information Processing Systems,2020,33:13539-13550. [14]WANG Y,CHEN Y C,ZHANG X,et al.Attentive normalization for conditional image generation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2020:5094-5103. [15]ZHU J C,GAO L L,SONG J K,et al.Label-Guided Generative Adversarial Network for Realistic Image Synthesis [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(3):3311-3328. [16]HUANG X,BELONGIE S.Arbitrary style transfer in real-time with adaptive instance normalization[C]//Proceedings of the IEEE International Conference on Computer Vision.New York:IEEE Press,2017:1501-1510. [17]KARRAS T,LAINE S,AITTALA M,et al.Analyzing and improving the image quality of stylegan[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2020:8110-8119. [18]PARK T,LIU M Y,WANG T C,et al.Gaugan:semantic image synthesis with spatially adaptive normalization[C]//International Conference on Computer Graphics and Interactive Techniques.ACM,2019. [19]LIU M Y,HUANG X,MALLYA A,et al.Few-shot unsupervised image-to-image translation[C]//Proceedings of the IEEE International Conference on Computer Vision.New York:IEEE Press,2019:10551-10560. [20]SAITO K,SAENKO K,LIU M Y.Coco-funit:Few-shot unsupervised image translation with a content conditioned style encoder[C]//Proceedings of Computer Vision-ECCV 2020.Berlin:Springer,2020:382-398. [21]LI X X,AN W J,WU J J,et al.Channel attention bilinear metric network[J].Journal of Jilin University(Engineering and Technology Edition),2024,54(2):524-532. [22]PIZZATI F,LALONDE J F,DE. CHARETTE R.Manifest:Man-ifold deformation for few-shot image transla-tion[C]//Proceedings of Computer Vision(ECCV 2022).Berlin:Springer,2022:440-456. [23]WANG W,BAO J,ZHOU W,et al.Semantic image synthesisvia diffusion models [J].arXiv:2207.00050,2022. [24]CAREIL M,VERBEEK J,LATHUILIÈRE S.Few-shot Semantic Image Synthesis with Class Af-finity Transfer[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2023:23611-23620. [25]RICHAEDSON E,ALALUF Y,PATASHNIK O,et al.Encoding in style:a stylegan encoder for image-to-image translation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2021:2287-2296. [26]WANG Y,KHAN S,GONZALEZ-GARCIA A,et al.Semi-supervised learning for few-shot image-to-image translation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2020,4453-4462. [27]TAN Z,CHAI M,CHEN D,et al.Diverse semantic image synthesis via probability distribution modeling[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2021:7962-7971. [28]ZHU Z,XU Z,YOU A,et al.Semantically Multi-Modal Image Synthesis[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.New York:IEEE Press,2020,5466-5475. |
|