Computer Science ›› 2022, Vol. 49 ›› Issue (10): 151-158.doi: 10.11896/jsjkx.210900159
• Computer Graphics& Multimedia • Previous Articles Next Articles
FANG Zhong-jun1,2, ZHANG Jing1, LI Dong-dong1,2
CLC Number:
[1]MITCHELL M,DODGE J,GOYAL A,et al.Midge:Generating image descriptions from computer vision detections[C]//Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics.2012:747-756. [2]LU J,YANG J,BATRA D,et al.Neural baby talk[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7219-7228. [3]DEVLIN J,CHENG H,FANG H,et al.Language models for image captioning:The quirks and what works[C]//Association for Computational Linguistics(ACL).2015:100-105. [4]WANG C,YANG H,BARTZ C,et al.Image captioning with deep bidirectional LSTMs[C]//Proceedings of the 24th ACM international conference on Multimedia.2016:988-997. [5]ANDERSON P,HE X,BUEHLER C,et al.Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6077-6086. [6]LI G,ZHU L,LIU P,et al.Entangled transformer for image captioning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:8928-8937. [7]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014. [8]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [9]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:towardsreal-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(6):1137-1149. [10]VINYALS O,TOSHEV A,BENGIO S,et al.Show and tell:A neural image caption generator[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3156-3164. [11]XU K,BA J,KIROS R,et al.Show,attend and tell:Neuralimage caption generation with visual attention[C]//InternationalConference on Machine Learning.PMLR,2015:2048-2057. [12]LU J,XIONG C,PARIKH D,et al.Knowing when to look:Adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:375-383. [13]CHEN L,ZHANG H,XIAO J,et al.Sca-cnn:Spatial and channel-wise attention in convolutional networks for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5659-5667. [14]GUO Y,LIU Y,DE BOER M H T,et al.A dual prediction network for image captioning[C]//2018 IEEE International Conference on Multimedia and Expo.IEEE,2018:1-6. [15]GAN Z,GAN C,HE X,et al.Semantic compositional networks for visual captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5630-5639. [16]YAO T,PAN Y,LI Y,et al.Boosting image captioning with attributes[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:4894-4902. [17]FENG Y,LAN L,ZHANG X,et al.AttResNet:Attention-based ResNet for Image Captioning[C]//Proceedings of the 2018 International Conference on Algorithms,Computing and Artificial Intelligence.2018:1-6. [18]LI N,CHEN Z.Image Cationing with Visual-Semantic LSTM[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence.2018:793-799. [19]ZHANG J,LI K,WANG Z.Parallel-fusion LSTM with synchronous semantic and visual information for image captioning[J].Journal of Visual Communication and Image Representation,2021,75:103044. [20]ZHANG Z,WU Q,WANG Y,et al.Exploring region relationships implicitly:Image captioning with visual relationship attention[J].Image and Vision Computing,2021,109:104146. [21]PEI H,CHEN Q,WANG J,et al.Visual Relational Reasoning for Image Caption[C]//2020 International Joint Conference on Neural Networks.IEEE,2020:1-8. [22]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances inNeural Information Processing Systems.2017:5998-6008. [23]HERDADE S,KAPPELER A,BOAKYE K,et al.Image captioning:transforming objects into words[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.2019:11137-11147. [24]WANG D,HU H,CHEN D.Transformer with sparse self-attention mechanism for image captioning[J].Electronics Letters,2020,56(15):764-766. [25]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European Conference on Computer Vision.Cham:Springer,2014:740-755. [26]KARPATHY A,FEI-FEI L.Deep visual-semantic alignmentsfor generating image descriptions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3128-3137. [27]PAPINENI K,ROUKOS S,WARD T,et al.Bleu:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th annual meeting of the Association for Computational Linguistics.2002:311-318. [28]BANERJEE S,LAVIE A.METEOR:An automatic metric forMT evaluation with improved correlation with human judgments[C]//Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization.2005:65-72. [29]LIN C Y.Rouge:A package for automatic evaluation of summaries[C]//TextSummarization Branches Out.2004:74-81. [30]VEDANTAM R,LAWRENCE ZITNICK C,PARIKH D.Ci-der:Consensus-based image description evaluation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:4566-4575. [31]KINGMA D P,BA J.Adam:A method for stochastic optimization[C]//Proceedings of the 3rd International Conference for Learning Representations.2015. |
[1] | RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207. |
[2] | ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63. |
[3] | DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145. |
[4] | ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161. |
[5] | XIONG Li-qin, CAO Lei, LAI Jun, CHEN Xi-liang. Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization [J]. Computer Science, 2022, 49(9): 172-182. |
[6] | JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335. |
[7] | ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119. |
[8] | SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177. |
[9] | YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236. |
[10] | WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48. |
[11] | JIN Fang-yan, WANG Xiu-li. Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM [J]. Computer Science, 2022, 49(7): 179-186. |
[12] | XIONG Luo-geng, ZHENG Shang, ZOU Hai-tao, YU Hua-long, GAO Shang. Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism [J]. Computer Science, 2022, 49(7): 212-219. |
[13] | PENG Shuang, WU Jiang-jiang, CHEN Hao, DU Chun, LI Jun. Satellite Onboard Observation Task Planning Based on Attention Neural Network [J]. Computer Science, 2022, 49(7): 242-247. |
[14] | ZHANG Ying-tao, ZHANG Jie, ZHANG Rui, ZHANG Wen-qiang. Photorealistic Style Transfer Guided by Global Information [J]. Computer Science, 2022, 49(7): 100-105. |
[15] | ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112. |
|