基于空间相关性与特征级插值改进的快速图像翻译模型

doi:10.11896/jsjkx.221100027

摘要/Abstract

摘要： 近年来,深度学习算法的流行使图像翻译任务取得了显著的效果。其中,很多研究工作致力于在缩短模型运行时间的同时保持图像的生成质量,ASAPNet模型就是一个典型的代表。但该模型的特征级损失函数无法完全解耦图像特征和外观,又由于其大多数计算在极低的分辨率下执行,导致生成的图像质量不够理想。针对上述问题,提出了一种基于空间相关性和特征级插值的ASAPNet改进模型——SRFIT。具体来说,根据自相似性原理,使用空间相关性损失替换原模型中的特征匹配损失,以缓解图像翻译时的场景结构差异的问题,从而提高图像翻译的准确性。此外,受ReMix中数据增强方法的启发,通过线性插值在图像特征级上增加了数据量,解决了生成器过拟合的问题。最后,在两个公开数据集CMP Facades和Cityscapes上进行对比实验,结果均表明,相比当前的主流模型,所提出的改进模型SRFIT展现了更好的性能,可以在有效改善图像生成质量的同时,保持较快的运行速度。

关键词: 图像翻译, 自相似性, 数据增强, 生成对抗网络, 线性插值

Abstract: In recent years,with the popularity of deep learning algorithms,the image translation tasks have achieved remarkable results.Many researches are devoted to reduce model running time while maintaining the quality of image generation,among which ASAPNet model is a typical representative.However,the feature level loss function of this model cannot completely decouple image features and appearance,and most of its calculations are performed at extremely low resolution,resulting in poor image quality.In response to the above issues,this paper proposes an improved ASAPNet model—SRFIT,based on spatial correlation and feature level interpolation.Specifically,according to the principle of self-similarity,the spatially-correlative loss is used to replace the feature matching loss in the original model to alleviate the problem of scene structure differences during image translation,so as to improve the accuracy of image translation.In addition,inspired by the data augmentation method in ReMix,we also increase the amount of data at the image feature level through linear interpolation,which addresses the overfitting problem of the generator.Finally,the results of comparative experiments on two public datasets,facades and cityscapes,show that compared with the current mainstream models,the proposed method shows better performance,it can effectively improve the quality of generated image while maintaining a faster running speed.

Key words: Image translation, Self-similarity, Data augmentation, GAN, Linear interpolation

中图分类号:

TP391

李玉强, 李欢, 刘春. 基于空间相关性与特征级插值改进的快速图像翻译模型[J]. 计算机科学, 2023, 50(12): 156-165. https://doi.org/10.11896/jsjkx.221100027

LI Yuqiang, LI Huan, LIU Chun. Improved Fast Image Translation Model Based on Spatial Correlation and Feature Level Interpolation[J]. Computer Science, 2023, 50(12): 156-165. https://doi.org/10.11896/jsjkx.221100027

参考文献

[1]LEBEDEV V,GANIN Y,RAKHUBA M,et al.Speeding-upconvolutional neural networks using fine-tuned cp-decomposition[C]//3rd International Conference on Learning Representations.San Diego,CA,USA,2015.
[2]ZHOU A,YAO A,GUO Y,et al.Incremental network quantization:Towards lossless cnns with low-precision weights[C]//5th International Conference on Learning Representations.Toulon,France,2017.
[3]LI M,LIN J,DING Y,et al.Gan compression:Efficient architectures for interactive conditional gans[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5284-5294.
[4]ZOPH B,LE Q V.Neural architecture search withreinforce-ment learning[C]//5th International Conference on Learning Representations.Toulon,France,2017.
[5]HAN S,MAO H,DALLY W J.Deep compression:Compressing deep neural networks with pruning,trained quantization and huffman coding[J].arXiv:1510.00149,2015.
[6]LUO J H,WU J,LIN W.Thinet:A filter level pruning method for deep neural network compression[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:5058-5066.
[7]SHAHAM T R,GHARBI M,ZHANG R,et al.Spatially-adaptive pixelwise networks for fast image translation[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:14882-14891.
[8]WANG T C,LIU M Y,ZHU J Y,et al.High-resolution image synthesis and semantic manipulation with conditional gans[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8798-8807.
[9]PARK T,LIU M Y,WANG T C,et al.Semantic image synthesis with spatially-adaptive normalization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:2337-2346.
[10]KARRAS T,AITTALA M,HELLSTEN J,et al.Training generative adversarial networks with limited data[J].Advances in Neural Information Processing Systems,2020,33:12104-12114.
[11]ZHAO S,LIU Z,LIN J,et al.Differentiable augmentation fordata-efficient gan training[J].Advances in Neural Information Processing Systems,2020,33:7559-7570.
[12]ISOLA P,ZHU J Y,ZHOU T,et al.Image-to-image translation with conditional adversarialnetworks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1125-1134.
[13]SHRIVASTAVA A,PFISTER T,TUZEL O,et al.Learningfrom simulated and unsupervised images through adversarial training[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2107-2116.
[14]CHEN Q,KOLTUN V.Photographic image synthesis with cascaded refinement networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:1511-1520.
[15]KIM T,CHA M,KIM H,et al.Learning to discover cross-domain relations with generative adversarial networks[C]//International Conference on Machine Learning.PMLR,2017:1857-1865.
[16]ZHU J Y,PARK T,ISOLA P,et al.Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2223-2232.
[17]YOO J,UH Y,CHUN S,et al.Photorealistic style transfer via wavelet transforms[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:9036-9045.
[18]DOSOVITSKIY A,BROX T.Generating images with perceptual similarity metrics based on deep networks[J].Advances in neural information processing systems,2016,29:658-666.
[19]JOHNSON J,ALAHI A,LI F F.Perceptual losses for real-time style transfer and super-resolution[C]//European Conference on Computer Vision.Cham:Springer,2016:694-711.
[20]PARK T,EFROS A A,ZHANG R,et al.Contrastive learning for unpaired image-to-image translation[C]//European Confe-rence on Computer Vision.Cham:Springer,2020:319-345.
[21]ZHENG C,CHAM T J,CAI J.The spatially-correlative loss for various image translation tasks[C]//Proceedingsof the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:16407-16417.
[22]LIU M Y,HUANG X,MALLYA A,et al.Few-shot unsupervised image-to-image translation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:10551-10560.
[23]SAITO K,SAENKO K,LIU M Y.Coco-funit:Few-shot unsupervised image translation with a content conditioned style encoder[C]//European Conference on Computer Vision.Cham:Springer,2020:382-398.
[24]ZHANG H,ZHANG Z,ODENA A,et al.Consistency regularization for generative adversarial networks[C]//8th Interna-tional Conference on Learning Representations.Addis Ababa,Ethiopia,2020.
[25]TRAN N T,TRAN V H,NGUYEN N B,et al.Towards good practices for data augmentation in gan training[J].arXiv:2006.05338,2020.
[26]ZHAO Z,ZHANG Z,CHEN T,et al.Image augmentations for gan training[J].arXiv:2006.02595,2020.
[27]DEVRIES T,TAYLOR G W.Improved regularization of convolutional neural networks with cutout[J].arXiv:1708.04552,2017.
[28]CAO J,HOU L,YANG M H,et al.Remix:Towards image-to-image translation with limited data[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:15018-15027.
[29]TAN Z,CHEN D,CHU Q,et al.Efficient semantic image synthesis via class-adaptive normalization[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,44:4852-4866.
[30]CHEN Y J,CHENG S I,CHIU W C,et al.Vector Quantized Image-to-Image Translation[C]//European Conference on Computer Vision.Cham:Springer,2022:440-456.
[31]RONNEBERGER O,FISCHER P,BROX T.U-net:Convolu-tional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Compu-ter-assisted Intervention.Cham:Springer,2015:234-241.
[32]QI X,CHEN Q,JIA J,et al.Semi-parametric image synthesis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8808-8816.
[33]LIU X,YIN G,SHAO J,et al.Learning to predict layout-to-image conditional convolutions for semantic image synthesis[J].Advances in Neural Information Processing Systems,2019,32:570-580.
[34]LIU M Y,BREUEL T,KAUTZ J.Unsupervised image-to-im-age translation networks[J].Advances in Neuralinformation Processing Systems,2017,30:700-708.
[35]HUANG X,LIU M Y,BELONGIE S,et al.Multimodal unsupervised image-to-image translation[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:172-189.
[36]SHECHTMAN E,IRANI M.Matching local self-similaritiesacross images and videos[C]//2007 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2007:1-8.
[37]SHI J,MALIK J.Normalized cuts and image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(8):888-905.
[38]XU K,BA J,KIROS R,et al.Show,attend and tell:Neural image caption generation with visual attention[C]//International Conference on Machine Learning.PMLR,2015:2048-2057.
[39]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16:321-357.
[40]BECKHAM C,HONARI S,VERMA V,et al.On adversarial mixup resynthesis[J].Advances in Neural Information Proces-sing Systems,2019,32:4346-4357.
[41]BERTHELOT D,CARLINI N,GOODFELLOW I,et al.Mix-match:A holistic approach to semi-supervisedlearning[J].Advances in Neural Information Processing Systems,2019,32:5049-5059.
[42]DEVRIES T,TAYLOR G W.Dataset augmentation in feature space[C]//5th International Conference on Learning Representations.Toulon,France,2017.
[43]ZHANG H,CISSE M,DAUPHIN Y N,et al.mixup:Beyondempirical risk minimization[C]//6th International Conference on Learning Representations.Vancouver,BC,Canada,2018.
[44]WAN J,TANG S,ZHANG Y,et al.Hdidx:High-dimensional indexing for efficient approximate nearest neighbor search[J].Neurocomputing,2017,237:401-404.
[45]SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2818-2826.
[46]PEREYRA G,TUCKER G,CHOROWSKI J,et al.Regularizing neural networks by penalizing confident output distributions[C]//5th International Conference on Learning Representations.Toulon,France,2017.
[47]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[48]JIANG L,DAI B,WU W,et al.Focal frequency loss for image reconstruction and synthesis[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:13919-13929.
[49]TYLEČEK R,ŠÁRA R.Spatial pattern templates for recognition of objects with regular structure[C]//German Conference on Pattern Recognition.Berlin:Springer,2013:364-374.
[50]CORDTS M,OMRAN M,RAMOS S,et al.The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:3213-3223.
[51]HEUSEL M,RAMSAUER H,UNTERTHINER T,et al.Gans trained by a two time-scale update rule converge to a local nash equilibrium[J].Advances in Neural Information Processing Systems,2017,30:6626-6637.
[52]YU F,KOLTUN V,FUNKHOUSER T.Dilated residual networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:472-480.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed