Computer Science ›› 2023, Vol. 50 ›› Issue (12): 156-165.doi: 10.11896/jsjkx.221100027

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Improved Fast Image Translation Model Based on Spatial Correlation and Feature Level Interpolation

LI Yuqiang, LI Huan, LIU Chun   

  1. College of Computer Science and Artificial Intelligence,Wuhan University of Technology,Wuhan 435000,China
  • Received:2022-11-04 Revised:2023-04-04 Online:2023-12-15 Published:2023-12-07
  • About author:LI Yuqiang,born in 1977,Ph.D,asso-ciate professor.His main research in-terests includes machine learning,big data analysis,and image processing.
    LIU Chun,born in 1980,Ph.D,lecturer.Her main research interests includes data mining,parallel computing and machine learning.

Abstract: In recent years,with the popularity of deep learning algorithms,the image translation tasks have achieved remarkable results.Many researches are devoted to reduce model running time while maintaining the quality of image generation,among which ASAPNet model is a typical representative.However,the feature level loss function of this model cannot completely decouple image features and appearance,and most of its calculations are performed at extremely low resolution,resulting in poor image quality.In response to the above issues,this paper proposes an improved ASAPNet model—SRFIT,based on spatial correlation and feature level interpolation.Specifically,according to the principle of self-similarity,the spatially-correlative loss is used to replace the feature matching loss in the original model to alleviate the problem of scene structure differences during image translation,so as to improve the accuracy of image translation.In addition,inspired by the data augmentation method in ReMix,we also increase the amount of data at the image feature level through linear interpolation,which addresses the overfitting problem of the generator.Finally,the results of comparative experiments on two public datasets,facades and cityscapes,show that compared with the current mainstream models,the proposed method shows better performance,it can effectively improve the quality of generated image while maintaining a faster running speed.

Key words: Image translation, Self-similarity, Data augmentation, GAN, Linear interpolation

CLC Number: 

  • TP391
[1]LEBEDEV V,GANIN Y,RAKHUBA M,et al.Speeding-upconvolutional neural networks using fine-tuned cp-decomposition[C]//3rd International Conference on Learning Representations.San Diego,CA,USA,2015.
[2]ZHOU A,YAO A,GUO Y,et al.Incremental network quantization:Towards lossless cnns with low-precision weights[C]//5th International Conference on Learning Representations.Toulon,France,2017.
[3]LI M,LIN J,DING Y,et al.Gan compression:Efficient architectures for interactive conditional gans[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5284-5294.
[4]ZOPH B,LE Q V.Neural architecture search withreinforce-ment learning[C]//5th International Conference on Learning Representations.Toulon,France,2017.
[5]HAN S,MAO H,DALLY W J.Deep compression:Compressing deep neural networks with pruning,trained quantization and huffman coding[J].arXiv:1510.00149,2015.
[6]LUO J H,WU J,LIN W.Thinet:A filter level pruning method for deep neural network compression[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:5058-5066.
[7]SHAHAM T R,GHARBI M,ZHANG R,et al.Spatially-adaptive pixelwise networks for fast image translation[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:14882-14891.
[8]WANG T C,LIU M Y,ZHU J Y,et al.High-resolution image synthesis and semantic manipulation with conditional gans[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8798-8807.
[9]PARK T,LIU M Y,WANG T C,et al.Semantic image synthesis with spatially-adaptive normalization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:2337-2346.
[10]KARRAS T,AITTALA M,HELLSTEN J,et al.Training generative adversarial networks with limited data[J].Advances in Neural Information Processing Systems,2020,33:12104-12114.
[11]ZHAO S,LIU Z,LIN J,et al.Differentiable augmentation fordata-efficient gan training[J].Advances in Neural Information Processing Systems,2020,33:7559-7570.
[12]ISOLA P,ZHU J Y,ZHOU T,et al.Image-to-image translation with conditional adversarialnetworks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1125-1134.
[13]SHRIVASTAVA A,PFISTER T,TUZEL O,et al.Learningfrom simulated and unsupervised images through adversarial training[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2107-2116.
[14]CHEN Q,KOLTUN V.Photographic image synthesis with cascaded refinement networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:1511-1520.
[15]KIM T,CHA M,KIM H,et al.Learning to discover cross-domain relations with generative adversarial networks[C]//International Conference on Machine Learning.PMLR,2017:1857-1865.
[16]ZHU J Y,PARK T,ISOLA P,et al.Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2223-2232.
[17]YOO J,UH Y,CHUN S,et al.Photorealistic style transfer via wavelet transforms[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:9036-9045.
[18]DOSOVITSKIY A,BROX T.Generating images with perceptual similarity metrics based on deep networks[J].Advances in neural information processing systems,2016,29:658-666.
[19]JOHNSON J,ALAHI A,LI F F.Perceptual losses for real-time style transfer and super-resolution[C]//European Conference on Computer Vision.Cham:Springer,2016:694-711.
[20]PARK T,EFROS A A,ZHANG R,et al.Contrastive learning for unpaired image-to-image translation[C]//European Confe-rence on Computer Vision.Cham:Springer,2020:319-345.
[21]ZHENG C,CHAM T J,CAI J.The spatially-correlative loss for various image translation tasks[C]//Proceedingsof the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:16407-16417.
[22]LIU M Y,HUANG X,MALLYA A,et al.Few-shot unsupervised image-to-image translation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:10551-10560.
[23]SAITO K,SAENKO K,LIU M Y.Coco-funit:Few-shot unsupervised image translation with a content conditioned style encoder[C]//European Conference on Computer Vision.Cham:Springer,2020:382-398.
[24]ZHANG H,ZHANG Z,ODENA A,et al.Consistency regularization for generative adversarial networks[C]//8th Interna-tional Conference on Learning Representations.Addis Ababa,Ethiopia,2020.
[25]TRAN N T,TRAN V H,NGUYEN N B,et al.Towards good practices for data augmentation in gan training[J].arXiv:2006.05338,2020.
[26]ZHAO Z,ZHANG Z,CHEN T,et al.Image augmentations for gan training[J].arXiv:2006.02595,2020.
[27]DEVRIES T,TAYLOR G W.Improved regularization of convolutional neural networks with cutout[J].arXiv:1708.04552,2017.
[28]CAO J,HOU L,YANG M H,et al.Remix:Towards image-to-image translation with limited data[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:15018-15027.
[29]TAN Z,CHEN D,CHU Q,et al.Efficient semantic image synthesis via class-adaptive normalization[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,44:4852-4866.
[30]CHEN Y J,CHENG S I,CHIU W C,et al.Vector Quantized Image-to-Image Translation[C]//European Conference on Computer Vision.Cham:Springer,2022:440-456.
[31]RONNEBERGER O,FISCHER P,BROX T.U-net:Convolu-tional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Compu-ter-assisted Intervention.Cham:Springer,2015:234-241.
[32]QI X,CHEN Q,JIA J,et al.Semi-parametric image synthesis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8808-8816.
[33]LIU X,YIN G,SHAO J,et al.Learning to predict layout-to-image conditional convolutions for semantic image synthesis[J].Advances in Neural Information Processing Systems,2019,32:570-580.
[34]LIU M Y,BREUEL T,KAUTZ J.Unsupervised image-to-im-age translation networks[J].Advances in Neuralinformation Processing Systems,2017,30:700-708.
[35]HUANG X,LIU M Y,BELONGIE S,et al.Multimodal unsupervised image-to-image translation[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:172-189.
[36]SHECHTMAN E,IRANI M.Matching local self-similaritiesacross images and videos[C]//2007 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2007:1-8.
[37]SHI J,MALIK J.Normalized cuts and image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(8):888-905.
[38]XU K,BA J,KIROS R,et al.Show,attend and tell:Neural image caption generation with visual attention[C]//International Conference on Machine Learning.PMLR,2015:2048-2057.
[39]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16:321-357.
[40]BECKHAM C,HONARI S,VERMA V,et al.On adversarial mixup resynthesis[J].Advances in Neural Information Proces-sing Systems,2019,32:4346-4357.
[41]BERTHELOT D,CARLINI N,GOODFELLOW I,et al.Mix-match:A holistic approach to semi-supervisedlearning[J].Advances in Neural Information Processing Systems,2019,32:5049-5059.
[42]DEVRIES T,TAYLOR G W.Dataset augmentation in feature space[C]//5th International Conference on Learning Representations.Toulon,France,2017.
[43]ZHANG H,CISSE M,DAUPHIN Y N,et al.mixup:Beyondempirical risk minimization[C]//6th International Conference on Learning Representations.Vancouver,BC,Canada,2018.
[44]WAN J,TANG S,ZHANG Y,et al.Hdidx:High-dimensional indexing for efficient approximate nearest neighbor search[J].Neurocomputing,2017,237:401-404.
[45]SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2818-2826.
[46]PEREYRA G,TUCKER G,CHOROWSKI J,et al.Regularizing neural networks by penalizing confident output distributions[C]//5th International Conference on Learning Representations.Toulon,France,2017.
[47]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[48]JIANG L,DAI B,WU W,et al.Focal frequency loss for image reconstruction and synthesis[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:13919-13929.
[49]TYLEČEK R,ŠÁRA R.Spatial pattern templates for recognition of objects with regular structure[C]//German Conference on Pattern Recognition.Berlin:Springer,2013:364-374.
[50]CORDTS M,OMRAN M,RAMOS S,et al.The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:3213-3223.
[51]HEUSEL M,RAMSAUER H,UNTERTHINER T,et al.Gans trained by a two time-scale update rule converge to a local nash equilibrium[J].Advances in Neural Information Processing Systems,2017,30:6626-6637.
[52]YU F,KOLTUN V,FUNKHOUSER T.Dilated residual networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:472-480.
[1] XU Jie, WANG Lisong. Contrastive Clustering with Consistent Structural Relations [J]. Computer Science, 2023, 50(9): 123-129.
[2] LIANG Jiayin, XIE Zhipeng. Text Paraphrase Generation Based on Pre-trained Language Model and Tag Guidance [J]. Computer Science, 2023, 50(8): 150-156.
[3] WU Jufeng, ZHAO Xungang, ZHOU Qiang, RAO Ning. Contrastive Learning for Low-light Image Enhancement [J]. Computer Science, 2023, 50(6A): 220600171-6.
[4] HUANG Fangwan, LU Juhong, YU Zhiyong. Data Augmentation for Cardiopulmonary Exercise Time Series of Young HypertensivePatients Based on Active Barycenter [J]. Computer Science, 2023, 50(6A): 211200233-11.
[5] ZENG Wu, MAO Guojun. Few-shot Learning Method Based on Multi-graph Feature Aggregation [J]. Computer Science, 2023, 50(6A): 220400029-10.
[6] WANG Qingyu, WANG Hairui, ZHU Guifu, MENG Shunjian. Study on SQL Injection Detection Based on FlexUDA Model [J]. Computer Science, 2023, 50(6A): 220600172-6.
[7] KANG Shuming, ZHU Yan. Text Stance Detection Based on Topic Attention and Syntactic Information [J]. Computer Science, 2023, 50(11A): 230200068-5.
[8] WANG Chundong, DU Yingqi, MO Xiuliang, FU Haoran. Enhanced Federated Learning Frameworks Based on CutMix [J]. Computer Science, 2023, 50(11A): 220800021-8.
[9] LU Qi, YU Yuanqiang, XU Daoming, ZHANG Qi. Improved YOLOv5 Small Drones Target Detection Algorithm [J]. Computer Science, 2023, 50(11A): 220900050-8.
[10] LUO Yuetong, LI Chao, DUAN Chang, ZHOU Bo. Hue Augmentation Method for Industrial Product Surface Defect Images [J]. Computer Science, 2023, 50(11A): 230200089-6.
[11] CHEN Wanze, CHEN Jiazhen, HUANG Liqing, YE Feng, HUANG Tianqiang, LUO Haifeng. Controlled Facial Gender Forgery Combining Wavelet Transform High Frequency Information [J]. Computer Science, 2023, 50(11A): 221000241-10.
[12] LIU Nan, ZHANG Fengli, YIN Jiaqi, CHEN Xueqin, WANG Ruijin. Rumor Detection Model on Social Media Based on Contrastive Learning with Edge-inferenceAugmentation [J]. Computer Science, 2023, 50(11): 49-54.
[13] WU Yushan, XU Zengmin, ZHANG Xuelian, WANG Tao. Self-supervised Action Recognition Based on Skeleton Data Augmentation and Double Nearest Neighbor Retrieval [J]. Computer Science, 2023, 50(11): 97-106.
[14] YANG Bing-xin, GUO Yan-rong, HAO Shi-jie, Hong Ri-chang. Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition [J]. Computer Science, 2022, 49(7): 57-63.
[15] WANG Jian-ming, CHEN Xiang-yu, YANG Zi-zhong, SHI Chen-yang, ZHANG Yu-hang, QIAN Zheng-kun. Influence of Different Data Augmentation Methods on Model Recognition Accuracy [J]. Computer Science, 2022, 49(6A): 418-423.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!