融合交叉注意力机制的图像任意风格迁移

doi:10.11896/jsjkx.210700236

摘要/Abstract

摘要： 图像风格迁移指将一张普通照片转化为一张具有其他艺术风格效果的图像,随着深度学习的发展,出现了一些图像任意风格迁移算法,给定任意风格便能生成具有该风格的风格化图像。针对任意风格迁移算法中存在如何同时适应全局和局部风格,保持空间一致性问题,提出了一个融合交叉注意力的任意风格迁移算法网络,通过捕捉长程依赖,高效生成全局与局部风格协调的风格化图像;针对风格化图像的内容结构扭曲问题,在进行风格迁移之前,加入一组并行的通道空间注意力网络,该注意力网络能进一步细化关键特征,保留关键信息;除此之外,提出了一个新的损失函数,在消除伪影的同时能更好地保留内容结构信息。该算法能根据内容图像的语义空间分布,匹配语义上最接近的风格特征,高效灵活地调整局部风格,且能保留更多内容结构的原始信息。实验结果表明,所提算法能够生成任意风格且视觉效果更佳的高质量风格化图像。

关键词: 长程依赖, 交叉注意力, 卷积神经网络, 任意风格迁移, 特征融合, 通道空间注意力

Abstract: Arbitrary style transfer is a technique for transferring an ordinary photo to an image with another artistic style.With the development of deep learning,some image arbitrary style transfer algorithms have emerged to generate stylized images with arbitrary styles.To solve the problems in adapting to both global and local styles,maintaining spatial consistency,this paper proposes an arbitrary style transfer via criss-cross attention network,which can efficiently generate stylized images with coordinated global and local styles by capturing long-range dependencies.To address the problem of the distorted content structure of stylized images,a group of the parallel channel and spatial attention networks are added before style transfer,which can further emphasize key features and retain key information.In addition,a new loss function is proposed to eliminate artifacts while preserving the structural information of the content images.This algorithm can match the closest semantic style feature to the content feature,and adjust the local style efficiently and flexibly according to the semantic spatial distribution of the content image.Moreover,it can retain more original information about the structure.The experimental results show that the proposed method can transfer the image into different styles with higher quality and better visual effects.

Key words: Arbitrary style transfer, Channel and spatial attention, Convolutional neural network, Criss-cross attention, Feature fusion, Long-range dependencies

中图分类号:

TP391.41

杨玥, 冯涛, 梁虹, 杨扬. 融合交叉注意力机制的图像任意风格迁移[J]. 计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236

YANG Yue, FENG Tao, LIANG Hong, YANG Yang. Image Arbitrary Style Transfer via Criss-cross Attention[J]. Computer Science, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236

参考文献

[1] GATYS L A,ECKER A S,BETHGE M.Image style transferusing convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2016:2414-2423.
[2] JOHNSON J,ALAHI A,LI F F.Perceptual losses for real-time style transfer and super-resolution[C]//European Conference on Computer Vision.Berlin:Springer,2016:694-711.
[3] LUAN F,PARIS S,SHECHTMAN E,et al.Deep photo style transfer[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2017:4990-4998.
[4] GU S,CHEN C,LIAO J,et al.Arbitrary style transfer withdeep feature reshuffle[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2018:8222-8231.
[5] JING Y,LIU X,DING Y,et al.Dynamic instance normalization for arbitrary style transfer[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Palo Alto:IAAA Press,2020:4369-4376.
[6] LI X,LIU S,KAUTZ J,et al.Learning linear transformations for fast image and video style transfer[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2019:3809-3817.
[7] HUANG X,BELONGIE S.Arbitrary style transfer in real-time with adaptive instance normalization[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2017:1501-1510.
[8] LI Y,FANG C,YANG J,et al.Universal style transfer via feature transforms[J].arXiv:1705.08086,2017.
[9] PARK D Y,LEE K H.Arbitrary style transfer with style-atten-tional networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2019:5880-5888.
[10] KYPRIANIDIS J E,COLLOMOSSE J,WANG T,et al.State of the “art”:A taxonomy of artistic stylization techniques for images and video[J].IEEE Transactions on Visualization and Computer Graphics,Institute of Electrical and Electronics Engineers,2013,19(5):866-885.
[11] EFROS A A,LEUNG T K.Texture synthesis by non-parametric sampling[C]//IEEE International Conference on Computer Vision.Piscataway:IEEE Press,1999:1033-1038.
[12] ALEXEI A,EFROS W T.Image quilting for texture synthesis and transfer[C]//Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques.New York:ACM,2001:341-346.
[13] ELAD M,MILANFAR P.Style-transfer via texture-synthesis[J].arXiv:1609.03057,2016.
[14] ULYANOV D,LEBEDEV V,VEDALDI A,et al.Texture networks:feed-forward synthesis of textures and stylized images[J].arXiv:1603.03417,2016.
[15] SHENG L,LIN Z,SHAO J,et al.Avatar-Net:Multiscale zero-shot style transfer by feature decoration[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2018:8242-8250.
[16] ULYANOV D,VEDALDI A,LEMPITSKY V.Improved texture networks:Maximizing quality and diversity in feed-forward stylization and texture synthesis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2017:4105-4113.
[17] LI C,WANG M.Combining markov random fields and convolutional neural networks for image synthesis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2016:2479-2486.
[18] WANG X,ZHANG D,WANG Y.Multimodal transfer:A hierarchical deep convolutional neural network for fast artistic style transfer[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2017:7178-7186.
[19] GATYS L A,ECKER A S,BETHGE M,et al.Controlling perceptual factors in neural style transfer[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2017:3730-3738.
[20] DENG Y,TANG F,DONG W,et al.Arbitrary style transfer via multi-adaptation network[C]//Proceedings of the 28th ACM International Conference on Multimedia.New York:ACM,2020:2719-2727.
[21] LI Y,FANG C,YANG J,et al.Diversified texture synthesiswith feed-forward networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Pisca-taway:IEEE Press,2017:3266-3274.
[22] LI X,LIU S,YANG M.Learning linear transformations for fast image and video style transfer[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Pisca-taway:IEEE Press,2019:3804-3812.
[23] ALEX J,CHAMPANDAR D.Semantic style transfer and tur-ning two-bit doodles into fine artworks[J].arXiv:1603.01768,2016.
[24] DUMOULIN V,SHLENS J,KUDLUR M.A learned representation for artistic style[J].arXiv:1610.07629,2016.
[25] YAO X,PUY G,PÉREZ P.Photo style transfer with consistency losses[C]//International Conference on Image Processing.Piscataway:IEEE Press,2019:2314-2318.
[26] LI Y,LIU M Y,LI X,et al.A Closed-form Solution to Photo-realistic Image Stylization[J].arXiv:1802.06474,2018.
[27] MNIH V,HEESS N,GRAVES A.Recurrent models of visual attention[C]//Advances in Neural Information Processing Systems.Massachusetts:MIT Press,2014:2204-2212.
[28] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.Massachusetts:MIT Press,2017:5998-6008.
[29] WANG X,GIRSHICK R,GUPTA A,et al.Non-local neuralnetworks[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.Piscataway:IEEE Press,2018:7794-7803.
[30] HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern recognition.Piscataway:IEEE Press,2020:2011-2023.
[31] WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision.Berlin:Springer,2018:3-19.
[32] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[33] LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European Conference on Computer Vision.Berlin:Springer,2014:740-755.
[34] PHILLIPS F,MACKINTOSH B.Wiki Art Gallery,Inc:A case for critical thinking[J].Issues in Accounting Education,2011,26(3):593-608.
[35] DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Pisca-taway:IEEE Press,2009:248-255.

相关文章 15

[1]	周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2]	李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[3]	陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[4]	朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[5]	檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[6]	张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[7]	戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[8]	程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[9]	刘月红, 牛少华, 神显豪. 基于卷积神经网络的虚拟现实视频帧内预测编码 Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network 计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[10]	徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[11]	金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[12]	郁舒昊, 周辉, 叶春杨, 王太正. SDFA:基于多特征融合的船舶轨迹聚类方法研究 SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion 计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253
[13]	杨健楠, 张帆. 一种结合双注意力机制和层次网络结构的细碎农作物分类方法 Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure 计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169
[14]	王杉, 徐楚怡, 师春香, 张瑛. 基于CNN-LSTM的卫星云图云分类方法研究 Study on Cloud Classification Method of Satellite Cloud Images Based on CNN-LSTM 计算机科学, 2022, 49(6A): 675-679. https://doi.org/10.11896/jsjkx.210300177
[15]	孙福权, 崔志清, 邹彭, 张琨. 基于多尺度特征的脑肿瘤分割算法 Brain Tumor Segmentation Algorithm Based on Multi-scale Features 计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed