计算机科学 ›› 2020, Vol. 47 ›› Issue (12): 183-189.doi: 10.11896/jsjkx.190900181
张凯, 李军辉, 周国栋
ZHANG Kai, LI Jun-hui, ZHOU Guo-dong
摘要: 图像标题(ImageCaption)的研究大多是对图像生成单一语言的标题而在当今各国语言交汇融合的情况下对一张图像生成两门甚至多门语言标题是必然趋势以让不同母语的人理解其他人对同一张图片的评价.对此提出一种双语图像标题即图像同时生成两种语言标题的方法.该方法由一个编码器和两个不同的解码器组成其中编码器基于卷积神经网络用于提取图像特征;解码器基于长短时记忆网络两个不同的解码器分别用于解码两种不同的语言特征.由于两种语言标题之间存在着互译的特性因此提出了双语料图像标题的联合生成模型.具体地在解码端采用交替的方式生成两种语言的标题使得在预测某种语言的下一个单词时不仅可以利用该语言标题的历史信息还可以利用另一门语言标题的历史信息同时促进两种语言标题生成的性能.基于MSCOCO2014数据集的实验结果表明双语图像标题联合生成能够同时提高两门语言的性能在英文上较英文单语言标题生成的性能提高了1.0个BLEU_4值和0.98个CIDEr值在日文上较日文单语言标题生成的性能提高了1.0个BLEU_4值和0.31个CIDEr值.
中图分类号:
[1] ALI F,HEJRATI M,AMIN M S,et al.Every Picture Tells a Story:Generating Sentences from Images[C]//Proceedings Part IV of the 11th European Conference on Computer Vision.Heraklion,Crete,Greece:Springer,2010:15-29. [2] KULKARNI G,PREMRAJ V,ORDONEZ V,et al.Babytalk:Understanding and generating simple image descriptions[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(12):2891-2903. [3] VINYALS O,TOSHEV A,BENGIO S,et al.Show and tell:A neural image caption generator[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Boston,MA,USA:IEEE,2015:3156-3164. [4] KARPATHY A,LI F F.Deep visual-semantic alignments forgenerating image descriptions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3128-3137. [5] MAO J H,XU W,YANG Y,et al.Deep captioning with multimodal recurrent neural networks (m-rnn)[J].arXiv:1412.6632. [6] XU J,GAVVES E,FERNANDO B,et al.Guiding the long-short term memory model for image caption generation[C]//Procee-dings of the IEEE International Conference on Computer Vision.2015:2407-2415. [7] WU Q,SHEN C H,LIU L Q,et al.What value do explicit high level concepts have in vision to language problems?[C]//Proceedings of the IEEE Conference on Computer Vision and Pat-tern Recognition.2016:203-212. [8] XU K,BA J,KIROS R,et al.Show,Attend and Tell:Neural Ima-ge Caption Generation with Visual Attention[C]//Proceedings of the 32nd International Conference on Machine Learning.Lille,France:JMLR.org,2015:2048-2057. [9] LU J S,XIONG C M,PARIKH D,et al.Knowing when to look:Adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pat-tern Recognition.2017:375-383. [10] CHEN L,ZHANG H W,XIAO J,et al.Sca-cnn:Spatial and channel-wise attention in convolutional networks for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5659-5667. [11] LI X R,LAN W Y,DONG J F,et al.Adding Chinese Captions to Ima-ges[C]//Proceedings of the 2016 Association for Computing Machinery(ACM) on International Conference on Multimedia Retrieval.New York,USA:ACM,2016:271-275. [12] SZEGEDY C,LIU W,JIA Y Q,et al.Going deeper with convolutions[C]//Proceedings of the 32nd International Conference on Machine Learning.Lille,France:JMLR.org,2015:1-9. [13] RENNIE S J,MARCHERET E,MROUEH Y,et al.Self-critical sequence training for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7008-7024. [14] ANDERSON P,HE X D,BUEHLER C,et al.Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6077-6086. [15] DOGNIN P L,MELNYK I,MROUEH Y,et al.AdversarialSemantic Alignment for Improved Image Captions[J].arXiv:1805.00063v3. [16] BITEN A F,GOMEZ L,RUSIÑOL,MARçAL,et al.Good News,Everyone! Context driven entity-aware captioning for news images[J].arXiv:1904.01475. [17] KIM D J,CHOI J,OH T H,et al.Dense Relational Captioning:Triple-Stream Networks for Relationship-Based Captioning[J].arXiv:1903.05942v3. [18] MITRA S,AVRA L J,MCCLUSKEY E J,et al.Scan synthesis for one-hot signals[C]//Proceedings International Test Confe-rence.IEEE,1997:714-722. [19] WERLEN L M,PAPPAS N,RAM D,et al.Self-attentive residual decoder for neural machine translation[J].arXiv:1709.04849. [20] LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European conference on computer vision.Cham:Springer,2014:740-755. [21] YOSHIKAWA Y,SHIGETO Y,Takeuchi A.Stair captions:Constructing a large-scale japanese image caption dataset[J].arXiv:1705.00823,2017. [22] PAPINENI K,ROUKOS S,WARD T,et al.Bleu:a Method for Automatic Evaluation of Machine Translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.Philadelphia,PA,USA:ACL,2002:311-318. [23] DENKOWSKI M,LAVI A.Meteor universal:Language specific translation evaluation for any target language[C]//Proceedings of the Ninth Workshop on Statistical Machine Translation.2014:376-380. [24] LIN C Y.Rouge:A package for automatic evaluation of summaries[C]//Post-Conference Workshop of ACL 2004.2004. [25] VEDANTAM R,ZITNICK C L,PARIKH D,et al.CIDEr:Consensus-based image description evaluation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:4566-4575. [26] ANDERSON P,FERNANDO B,JOHNSON M,et al.Spice:Semantic propositional image caption evaluation[C]//European Conference on Computer Vision.Cham:Springer,2016:382-398. [27] HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [28] RUSSAKOVSKY O,DENG J,SU H,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252. [29] KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980. [30] WISEMAN S,RUSH A M.Sequence-to- sequence learning asbeam-search optimization[J].arXiv:1606.02960. [31] IOFFE S,SZEGEDY C.Batch Normalization.Accelerating Deep Network Training by Reducing Internal Covariate Shift[C]//Proceedings of the 32nd International Conference on Machine Learning.Lille,France:JMLR.org,2015:448-456. [32] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008. |
[1] | 黄一龙, 李培峰, 朱巧明. 事件因果与时序关系识别的联合推理模型 Joint Model of Events’ Causal and Temporal Relations Identification 计算机科学, 2018, 45(6): 204-207. https://doi.org/10.11896/j.issn.1002-137X.2018.06.036 |
|