计算机科学 ›› 2024, Vol. 51 ›› Issue (8): 224-231.doi: 10.11896/jsjkx.230500038

• 计算机图形学&多媒体 • 上一篇    下一篇

基于原型修正的小样本半监督语义图像翻译算法

何知霖1,2, 顾天昊1,2, 徐冠华1   

  1. 1 青岛大学自动化学院 山东 青岛 260000
    2 青岛大学智能无人系统研究院 山东 青岛 260000
  • 收稿日期:2023-05-08 修回日期:2023-09-30 出版日期:2024-08-15 发布日期:2024-08-13
  • 通讯作者: 顾天昊(gutianhao@qdu.edu.cn)
  • 作者简介:(hezhilin2000@163.com)
  • 基金资助:
    国家自然科学基金(62076094,61773227);中国博士后科学基金(2022M721744);山东省博士后创新人才支持计划(SDBX2022023)

Few-shot Semi-supervised Semantic Image Translation Algorithm Based on Prototype Correction

HE Zhilin1,2, GU Tianhao1,2, XU Guanhua1   

  1. 1 School of Automation,Qingdao University,Qingdao,Shandong 260000,China
    2 Institute of Intelligent Unmanned System,Qingdao University,Qingdao,Shandong 260000,China
  • Received:2023-05-08 Revised:2023-09-30 Online:2024-08-15 Published:2024-08-13
  • About author:HE Zhilin,born in 2000,postgraduate,is a member of CCF(No.P8040G).Her main research interests include machine learning and visual navigation.
    GU Tianhao,born in 1990,Ph.D,postgraduate supervisor.His main research interests include pattern recognition,machine learning,visual navigation and deep space exploration.
       
  • Supported by:
    National Natural Science Foundation of China(62076094,61773227),China Postdoctoral Science Foundation(2022M721744) and Postdoctoral Innovation Talent Support Program of Shandong Province(SDBX2022023).

摘要: 图像翻译任务是计算机视觉领域一个重要的研究方向,在图像风格化、超分辨率图像生成等视觉领域都有着广泛的应用。针对图像翻译任务中语义信息标注成本高、数据集通常标注困难的问题,提出了一种基于原型修正的小样本语义图像翻译算法,该算法主要包含StyleGAN、语义相似度回归器、pSp编码器模块。首先,为了降低模型对标签图像的依赖,该算法使用StyleGAN预训练模型充当生成器,增加小样本场景下的训练样本数和提升模型生成的多样性。其次,考虑到样本语义类内差异,该算法设计语义相似度回归器对原型进行修正,提升伪标签的准确率,增强模型优化效果。然后,结合标签图像和合成图像的特征图以及原型向量,实现语义信息的循环合成,构建出自监督损失函数以避免语义相似度回归器训练的标签信息需求,并利用伪标签图像对pSp编码器继续进行训练,实现语义图像翻译任务。最后,实验结果验证了所提算法在泛化性能和合成图像的多样性方面均优于经典算法。

关键词: 图像翻译, 原型修正, 小样本学习, 对抗生成网络

Abstract: Image translation plays a vital role in computer vision and has extensive applications in visual fields,such as image sty-lization and image super-resolution generation.Datasets are frequently challenging to label,and semantic labeling has substantial costs.This paper proposes a few-shot semantic image translation framework based on prototype correction,mainly encompassing the StyleGAN module,semantic similarity regressor module,and pSp encoder module.First,to decrease the dependence of the model on the labeled image,our framework utilizes the StyleGAN pre-trained model as a generator,which expands the number of training samples in few-shot and the diversity of image generation.Second,considering the variations within the sample semantic class,our framework designs a semantic similarity regressor to correct the prototype,improving the accuracy of the pseudo-label and enhancing the model optimization effect.Third,the cyclic synthesis of semantic information is realized by combining label feature maps,synthetic feature maps and prototype vectors.Meanwhile,a self-supervised loss function is constructed to avoid the label information requirements of semantic similarity regressor training.Then the pSp encoder is trained with pseudo-tag images,and the task of semantic image synthesis is achieved.Experimental results show that the proposed framework is superior to classical frameworks in terms of excellent generalization performance and diversity of synthesized images.

Key words: Image translation, Prototype correction, Few-shot learning, Generative adversarial network

中图分类号: 

  • TP391
[1]LAWRENCE N,JORDAN M.Semi-supervised learning viaGaussian processes [J].Advances in Neural Information Processing Systems,2004,17:753-760.
[2]WANG T C,LIU M Y,ZHU J Y,et al.High-resolution image synthesis and semantic manipulation with conditional gans[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2018:8798-8807.
[3]PARK T,LIU M Y,WANG T C,et al.Semantic image synthesis with spatially-adaptive normalization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2019:2337-2346.
[4]KARRAS T,LAINE S,AILA T.A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2019:4401-4410.
[5]ENDO Y,KANAMORI Y.Few-shot semantic image synthesisusing stylegan prior [J].arXiv:2103.14877,2021.
[6]OJHA U,LI Y,LU J,et al.Few-shot image generation viacross-domain correspondence[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2021:10743-10752.
[7]ROBB E,CHU W S,KUMAR A,et al.Few-shot Adaptation of Generative Adversarial Net-works[J].arXiv:2010.11943,2020.
[8]BAZAZIAN D,CALWAY A,DAMEN D.Dual-Domain ImageSynthesis using Segmenta-tion-Guided GAN[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2022:506-515.
[9]LEE C H,LIU Z,WU L,et al.Maskgan:Towards diverse and interactive facial image manipulation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2020:5549-5558.
[10]YU F,SEFF A,ZHANG Y,et al.Lsun:Construction of a large-scale image dataset using deep learning with humans in the loop [J].arXiv:1506.03365,2015.
[11]GARCIA-GARCIA A,ORTS-ESCOLANO S,OPREA S,et al.A review on deep learning techniques applied to semantic segmentation [J].arXiv:1704.06857,2017.
[12]HEUSEL M,RAMSAUER H,UNTERTHINER T,et al.Gans trained by a two time-scale update rule converge to a local nash equilibrium [J].Advances in Neural Information Processing Systems,2017,30:6626-6637.
[13]LIU H,BROCK A,SIMONYAN K,et al.Evolving normalization-activation layers [J].Advances in Neural Information Processing Systems,2020,33:13539-13550.
[14]WANG Y,CHEN Y C,ZHANG X,et al.Attentive normalization for conditional image generation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2020:5094-5103.
[15]ZHU J C,GAO L L,SONG J K,et al.Label-Guided Generative Adversarial Network for Realistic Image Synthesis [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(3):3311-3328.
[16]HUANG X,BELONGIE S.Arbitrary style transfer in real-time with adaptive instance normalization[C]//Proceedings of the IEEE International Conference on Computer Vision.New York:IEEE Press,2017:1501-1510.
[17]KARRAS T,LAINE S,AITTALA M,et al.Analyzing and improving the image quality of stylegan[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2020:8110-8119.
[18]PARK T,LIU M Y,WANG T C,et al.Gaugan:semantic image synthesis with spatially adaptive normalization[C]//International Conference on Computer Graphics and Interactive Techniques.ACM,2019.
[19]LIU M Y,HUANG X,MALLYA A,et al.Few-shot unsupervised image-to-image translation[C]//Proceedings of the IEEE International Conference on Computer Vision.New York:IEEE Press,2019:10551-10560.
[20]SAITO K,SAENKO K,LIU M Y.Coco-funit:Few-shot unsupervised image translation with a content conditioned style encoder[C]//Proceedings of Computer Vision-ECCV 2020.Berlin:Springer,2020:382-398.
[21]LI X X,AN W J,WU J J,et al.Channel attention bilinear metric network[J].Journal of Jilin University(Engineering and Technology Edition),2024,54(2):524-532.
[22]PIZZATI F,LALONDE J F,DE. CHARETTE R.Manifest:Man-ifold deformation for few-shot image transla-tion[C]//Proceedings of Computer Vision(ECCV 2022).Berlin:Springer,2022:440-456.
[23]WANG W,BAO J,ZHOU W,et al.Semantic image synthesisvia diffusion models [J].arXiv:2207.00050,2022.
[24]CAREIL M,VERBEEK J,LATHUILIÈRE S.Few-shot Semantic Image Synthesis with Class Af-finity Transfer[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2023:23611-23620.
[25]RICHAEDSON E,ALALUF Y,PATASHNIK O,et al.Encoding in style:a stylegan encoder for image-to-image translation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2021:2287-2296.
[26]WANG Y,KHAN S,GONZALEZ-GARCIA A,et al.Semi-supervised learning for few-shot image-to-image translation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2020,4453-4462.
[27]TAN Z,CHAI M,CHEN D,et al.Diverse semantic image synthesis via probability distribution modeling[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2021:7962-7971.
[28]ZHU Z,XU Z,YOU A,et al.Semantically Multi-Modal Image Synthesis[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.New York:IEEE Press,2020,5466-5475.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!