计算机科学 ›› 2022, Vol. 49 ›› Issue (6): 180-186.doi: 10.11896/jsjkx.211100129
陈章辉, 熊贇
CHEN Zhang-hui, XIONG Yun
摘要: 图像描述旨在为输入的图像生成描述文本以准确描述图像内容,而图像的风格化描述在此基础上引入了对语言风格的考虑,恰当表达出特定的语言风格,使得模型生成的描述文本更具多样性。为了更好地在生成的描述文本中融入风格元素,提出了基于解耦-检索-生成的图像风格化描述生成模型。该模型首先将风格化语料中的句子拆分成内容词汇和风格词汇,并构建了一个内容-风格词汇的记忆模块;然后根据图像的事实描述从记忆模块中检索出与之相匹配的风格词汇;最后将图像的事实描述和检索出的风格词汇输入语言模型中生成风格描述。在真实数据集上的实验结果表明,相比已有方法,所提模型在各项评价指标上都有着更好的性能表现,可以在描述图像内容的同时表达出特定的风格。
中图分类号:
[1] LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European Conference on Computer Vision.Cham:Springer,2014:740-755. [2] PLUMMER B A,WANG L,CERVANTES C M,et al.Flickr30k entities:Collecting region-to-phrase correspondences for richer image-to-sentence models[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:2641-2649. [3] GUO L,LIU J,YAO P,et al.Mscap:Multi-style image captioning with unpaired stylized text[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4204-4213. [4] BELL A.Language style as audience design[J].Language in society,1984,13(2):145-204. [5] PENNEBAKER J W.The secret life of pronouns[J].NewScientist,2011,211(2828):42-45. [6] MATHEWS A,XIE L,HE X.Senticap:Generating image descriptions with sentiments[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2016:3574-3580. [7] ZHAO W,WU X,ZHANG X.Memcap:Memorizing styleknowledge for image captioning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:12984-12992. [8] GAN C,GAN Z,HE X,et al.Stylenet:Generating attractivevisual captions with styles[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:3137-3146. [9] CHEN T,ZHANG Z,YOU Q,et al.“Factual” or “Emotional”':Stylized Image Captioning with Adaptive Learning and Attention[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:519-535. [10] CHEN C K,PAN Z,LIU M Y,et al.Unsupervised stylish image description generation via domain layer norm[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:8151-8158. [11] VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008. [12] KIM Y.Convolutional Neural Networks for Sentence Classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP).2014:1746-1751. [13] HUANG L,WANG W,CHEN J,et al.Attention on attentionfor image captioning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:4634-4643. [14] PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).2014:1532-1543. [15] CER D,YANG Y,KONG S,et al.Universal sentence encoder[J].arXiv:1803.11175,2018. [16] REIMERS N,GUREVYCH I.Sentence-bert:Sentence embed-dings using siamese bert-networks[J].arXiv:1908.10084,2019. [17] LEE K H,CHEN X,HUA G,et al.Stacked cross attention for image-text matching[C]//Proceedings of the European Confe-rence on Computer Vision(ECCV).2018:201-216. [18] RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training[EB/OL].https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf. [19] DAI N,LIANG J,QIU X,et al.Style Transformer:UnpairedText Style Transfer without Disentangled Latent Representation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:5997-6007. [20] DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018. [21] SUDHAKAR A,UPADHYAY B,MAHESWARAN A.Transforming Delete,Retrieve,Generate Approach for Controlled Text Style Transfer[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing(EMNLP).2019:3269-3279. [22] PAPINENI K,ROUKOS S,WARD T,et al.Bleu:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.2002:311-318. [23] BANERJEE S,LAVIE A.METEOR:An automatic metric forMT evaluation with improved correlation with human judgments[C]//Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization.2005:65-72. [24] VEDANTAM R,LAWRENCE Z C,PARIKH D.Cider:Consensus-based image description evaluation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:4566-4575. [25] STOLCKE A.SRILM-an extensible language modeling toolkit[C]//Seventh International Conference on Spoken Language Processing.2002:901-904. [26] VINYALS O,TOSHEV A,BENGIO S,et al.Show and tell:A neural image caption generator[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3156-3164. [27] LAMPLE G,SUBRAMANIAN S,SMITH E,et al.Multiple-attribute text rewriting[J].arXiv:1811.00552,2018. |
[1] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[2] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[3] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[4] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[5] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[6] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[7] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[8] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[9] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[10] | 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138 |
[11] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[12] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[13] | 刘伟业, 鲁慧民, 李玉鹏, 马宁. 指静脉识别技术研究综述 Survey on Finger Vein Recognition Research 计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056 |
[14] | 孙福权, 崔志清, 邹彭, 张琨. 基于多尺度特征的脑肿瘤分割算法 Brain Tumor Segmentation Algorithm Based on Multi-scale Features 计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217 |
[15] | 康雁, 徐玉龙, 寇勇奇, 谢思宇, 杨学昆, 李浩. 基于Transformer和LSTM的药物相互作用预测 Drug-Drug Interaction Prediction Based on Transformer and LSTM 计算机科学, 2022, 49(6A): 17-21. https://doi.org/10.11896/jsjkx.210400150 |
|