Computer Science ›› 2020, Vol. 47 ›› Issue (12): 183-189.doi: 10.11896/jsjkx.190900181
Previous Articles Next Articles
ZHANG Kai, LI Jun-hui, ZHOU Guo-dong
CLC Number:
[1] ALI F,HEJRATI M,AMIN M S,et al.Every Picture Tells a Story:Generating Sentences from Images[C]//Proceedings Part IV of the 11th European Conference on Computer Vision.Heraklion,Crete,Greece:Springer,2010:15-29. [2] KULKARNI G,PREMRAJ V,ORDONEZ V,et al.Babytalk:Understanding and generating simple image descriptions[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(12):2891-2903. [3] VINYALS O,TOSHEV A,BENGIO S,et al.Show and tell:A neural image caption generator[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Boston,MA,USA:IEEE,2015:3156-3164. [4] KARPATHY A,LI F F.Deep visual-semantic alignments forgenerating image descriptions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3128-3137. [5] MAO J H,XU W,YANG Y,et al.Deep captioning with multimodal recurrent neural networks (m-rnn)[J].arXiv:1412.6632. [6] XU J,GAVVES E,FERNANDO B,et al.Guiding the long-short term memory model for image caption generation[C]//Procee-dings of the IEEE International Conference on Computer Vision.2015:2407-2415. [7] WU Q,SHEN C H,LIU L Q,et al.What value do explicit high level concepts have in vision to language problems?[C]//Proceedings of the IEEE Conference on Computer Vision and Pat-tern Recognition.2016:203-212. [8] XU K,BA J,KIROS R,et al.Show,Attend and Tell:Neural Ima-ge Caption Generation with Visual Attention[C]//Proceedings of the 32nd International Conference on Machine Learning.Lille,France:JMLR.org,2015:2048-2057. [9] LU J S,XIONG C M,PARIKH D,et al.Knowing when to look:Adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pat-tern Recognition.2017:375-383. [10] CHEN L,ZHANG H W,XIAO J,et al.Sca-cnn:Spatial and channel-wise attention in convolutional networks for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5659-5667. [11] LI X R,LAN W Y,DONG J F,et al.Adding Chinese Captions to Ima-ges[C]//Proceedings of the 2016 Association for Computing Machinery(ACM) on International Conference on Multimedia Retrieval.New York,USA:ACM,2016:271-275. [12] SZEGEDY C,LIU W,JIA Y Q,et al.Going deeper with convolutions[C]//Proceedings of the 32nd International Conference on Machine Learning.Lille,France:JMLR.org,2015:1-9. [13] RENNIE S J,MARCHERET E,MROUEH Y,et al.Self-critical sequence training for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7008-7024. [14] ANDERSON P,HE X D,BUEHLER C,et al.Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6077-6086. [15] DOGNIN P L,MELNYK I,MROUEH Y,et al.AdversarialSemantic Alignment for Improved Image Captions[J].arXiv:1805.00063v3. [16] BITEN A F,GOMEZ L,RUSIÑOL,MARçAL,et al.Good News,Everyone! Context driven entity-aware captioning for news images[J].arXiv:1904.01475. [17] KIM D J,CHOI J,OH T H,et al.Dense Relational Captioning:Triple-Stream Networks for Relationship-Based Captioning[J].arXiv:1903.05942v3. [18] MITRA S,AVRA L J,MCCLUSKEY E J,et al.Scan synthesis for one-hot signals[C]//Proceedings International Test Confe-rence.IEEE,1997:714-722. [19] WERLEN L M,PAPPAS N,RAM D,et al.Self-attentive residual decoder for neural machine translation[J].arXiv:1709.04849. [20] LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European conference on computer vision.Cham:Springer,2014:740-755. [21] YOSHIKAWA Y,SHIGETO Y,Takeuchi A.Stair captions:Constructing a large-scale japanese image caption dataset[J].arXiv:1705.00823,2017. [22] PAPINENI K,ROUKOS S,WARD T,et al.Bleu:a Method for Automatic Evaluation of Machine Translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.Philadelphia,PA,USA:ACL,2002:311-318. [23] DENKOWSKI M,LAVI A.Meteor universal:Language specific translation evaluation for any target language[C]//Proceedings of the Ninth Workshop on Statistical Machine Translation.2014:376-380. [24] LIN C Y.Rouge:A package for automatic evaluation of summaries[C]//Post-Conference Workshop of ACL 2004.2004. [25] VEDANTAM R,ZITNICK C L,PARIKH D,et al.CIDEr:Consensus-based image description evaluation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:4566-4575. [26] ANDERSON P,FERNANDO B,JOHNSON M,et al.Spice:Semantic propositional image caption evaluation[C]//European Conference on Computer Vision.Cham:Springer,2016:382-398. [27] HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [28] RUSSAKOVSKY O,DENG J,SU H,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252. [29] KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980. [30] WISEMAN S,RUSH A M.Sequence-to- sequence learning asbeam-search optimization[J].arXiv:1606.02960. [31] IOFFE S,SZEGEDY C.Batch Normalization.Accelerating Deep Network Training by Reducing Internal Covariate Shift[C]//Proceedings of the 32nd International Conference on Machine Learning.Lille,France:JMLR.org,2015:448-456. [32] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008. |
[1] | HUANG Yi-long, LI Pei-feng, ZHU Qiao-ming. Joint Model of Events’ Causal and Temporal Relations Identification [J]. Computer Science, 2018, 45(6): 204-207. |
|