Computer Science ›› 2025, Vol. 52 ›› Issue (8): 222-231.doi: 10.11896/jsjkx.240600082
• Computer Graphics & Multimedia • Previous Articles Next Articles
LIU Jian, YAO Renyuan, GAO Nan, LIANG Ronghua, CHEN Peng
CLC Number:
[1]XIAO X,SUN Z,LI T,et al.Relational graph reasoning transformer for image captioning[C]//2022 IEEE International Conference on Multimedia and Expo(ICME).IEEE,2022:1-6. [2]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149. [3]LUO Y,JI J,SUN X,et al.Dual-Level Collaborative Transfor-mer for Image Captioning[C]//35th AAAI Conference on Artificial Intelligence.2021:2286-2293. [4]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems(NIPS'17).2017:6000-6010. [5]LI G,ZHU L,LIU P,et al.Entangled transformer for image captioning[C]//IEEE/CVF International Conference on Computer Vision(ICCV).2019:8927-8936. [6]HUANG L,WANG W,CHEN J,et al.Attention on Attention for Image Captioning[C]//IEEE/CVF International Conference on Computer Vision(ICCV).2019:4633-4642. [7]HAFETH D A,KOLLIAS S,GHAFOOR M.Semantic repre-sentations with attention networks for boosting image captioning[J].IEEE Access,2023,11:40230-40239. [8]LI D W,ZHANG X W,YAN L.Multimodal Name Entity Re-cognition Method Based on Heterogeneous Graph Network[J].Journal of Chinese Computer Systems.2024,45(9):2063-2070. [9]ANDERSON P,HE X,BUEHLER C,et al.Bottom-up and top-down attention for image captioning and visual question answe-ring[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6077-6086. [10]HERDADE S,KAPPELER A,BOAKYE K,et al.Image captioning:Transforming objects into words[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.Red Hook,NY:Curran Associates Inc.,2019:11137-11147. [11]WANG C,GU X.Learning joint relationship attention network for image captioning[J].Expert Systems with Applications,2023,211:118474. [12]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022. [13]NGUYEN V Q,SUGANUMA M,OKATANI T.Grit:Faster and better image captioning transformer using dual visual features[C]//European Conference on Computer Vision.Cham:Springer,2022:167-184. [14]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022. [15]WANG W,CHEN Z,HU H.Hierarchical attention network for image captioning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:8957-8964. [16]RADFORD A,WU J,CHILD R,et al.Language models are unsupervised multitask learners[J].OpenAI Blog,2019,1(8):9. [17]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:Common objects in context[C]//European Conference on Computer Vision.Cham:Springer,2014:740-755. [18]VOULODIMOS A,DOULAMIS N,DOULAMIS A,et al.Deep learning for computer vision:a brief review[J].Computational Intelligence and Neuroscience,2018,2018:7068349. [19]RICHARD E,REDDY B.Text classification for clinical trialoperations:evaluation and comparison of natural language processing techniques[J].Therapeutic Innovation & Regulatory Science,2021,55(2):447-453. [20]CORNIA M,STEFANINI M,BARALDI L,et al.Meshed-me-mory transformer for image captioning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:10578-10587. [21]GUO L,LIU J,ZHU X,et al.Normalized and geometry-aware self-attention network for image captioning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:10327-10336. [22]XU K,BA J,KIROS R,et al.Show,attend and tell:Neuralimage caption generation with visual attention[C]//Internatio-nal Conference on Machine Learning.PMLR,2015:2048-2057. [23]JIANG H,MISRA I,ROHRBACH M,et al.In defense of grid features for visual question answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:10267-10276. [24]LU J,XIONG C,PARIKH D,et al.Knowing when to look:Adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:375-383. [25]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [26]SATANJEEV B,ALON L.METEOR:An automatic metric for MT evaluation with improved correlation with human judgments[C]//Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization.2005:65-72. [27]WANG J,LI Y,PAN Y,et al.Contextual and selective attention networks for image captioning[J].Science China Information Sciences,2022,65(12):222103. [28]JI J,LUO Y,SUN X,et al.Improving image captioning by leveraging intra-and inter-layer global representation in transformer network[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:1655-1663. [29]HERDADE S,KAPPELER A,BOAKYE K,et al.Image captioning:Transforming objects into words[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.Red Hook,NY:Curran Associates Inc.,2019:11137-11147. [30]XIE T,DING W,ZHANG J,et al.Bi-LS-AttM:A Bidirectional LSTM and Attention Mechanism Model for Improving Image Captioning[J].Applied Sciences,2023,13(13):7916. [31]REN S,HE K,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[J].Advances in Neural Information Processing Systems,2015(28):91-99. [32]THANGAVEL K,PALANISAMY N,MUTHUSAMY S,et al.A novel method for image captioning using multimodal feature fusion employing mask RNN and LSTM models[J].Soft Computing,2023,27(19):14205-14218. [33]WANG J B,WANG W,WANG L,et al.Learning visual relationship and context-aware attention for image captioning[J].Pattern Recognition,2020:98:107075. [34]LI Z X,WEI H Y,HUANG F C,et al.Combine Visual Features and Scene Semantics for Image Captioning[J].Chinese Journal of Computer,2022,43(9):1624-1640. [35]YOU Q,JIN H,WANG Z,et al.Image captioning with semantic attention[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2016:4651-4659. [36]PEI H,CHEN Q,WANG J,et al.Visual relational reasoning for image caption[C]//2020 International Joint Conference on Neural Networks(IJCNN).IEEE,2020:1-8. [37]YAO T,PAN Y,LI Y,et al.Exploring visual relationship for image captioning[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:684-699. [38]KRISHNA R,ZHU Y,GROTH O,et al.Visual genome:Connecting language and vision using crowdsourced dense image annotations[J].International Journal of Computer Vision,2017,123:32-73. [39]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018. [40]RADFORD A,KIM J W,HALLACY C,et al.Learning transferable visual models from natural language supervision[C]//International Conference on Machine Learning.PMLR,2021:8748-8763. [41]PAPINENI K,ROUKOS S,WARD T,et al.Bleu:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th AnnualMeeting of the Association for Computational Linguistics.2002:311-318. [42]BANERJEE S,LAVIE A.METEOR:An automatic metric for MT evaluation with improved correlation with human judgments[C]//Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization.2005:65-72. [43]LIN C Y.Rouge:A package for automatic evaluation of summaries[M]//Text Summarization Branches Out.2004:74-81. [44]VEDANTAM R,LAWRENCE ZITNICK C,PARIKH D.Cider:Consensus-based image description evaluation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:4566-4575. [45]CHEN L,ZHANG H,XIAO J,et al.Sca-cnn:Spatial and channel-wise attention in convolutional networks for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5659-5667. [46]WANG J,WANG W,WANG L,et al.Learning visual relationship and context-aware attention for image captioning[J].Pattern Recognition,2020,98:107075. [47]ZHANG J,MEI K,ZHENG Y,et al.Integrating part of speech guidance for image captioning[J].IEEE Transactions on Multimedia,2020,23:92-104. [48]LI X,JIANG S.Know more say less:Image captioning based on scene graphs[J].IEEE Transactions on Multimedia,2019,21(8):2117-2130. [49]DING S,QU S,XI Y,et al.Stimulus-driven and concept-driven analysis for image caption generation[J].Neurocomputing,2020,398:520-530. [50]ZHA Z J,LIU D,ZHANG H,et al.Context-aware visual policy network for fine-grained image captioning[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,44(2):710-722. [51]YANG L,HU H,XING S,et al.Constrained LSTM and residual attention for image captioning[J].ACM Transactions on Multimedia Computing,Communications,and Applications,2020,16(3):1-18. [52]ZHANG Y,SHI X,MI S,et al.Image captioning with transformer and knowledge graph[J].Pattern Recognition Letters,2021,143:43-49. [53]WU J,CHEN T,WU H,et al.Fine-grained image captioning with global-local discriminative objective[J].IEEE Transactions on Multimedia,2020,23:2413-2427. [54]WU L,XU M,SANG L,et al.Noise augmented double-stream graph convolutional networks for image captioning[J].IEEE Transactions on Circuitsand Systems for Video Technology,2020,31(8):3118-3127. |
[1] | GUO Husheng, ZHANG Xufei, SUN Yujie, WANG Wenjian. Continuously Evolution Streaming Graph Neural Network [J]. Computer Science, 2025, 52(8): 118-126. |
[2] | LIU Yajun, JI Qingge. Pedestrian Trajectory Prediction Based on Motion Patterns and Time-Frequency Domain Fusion [J]. Computer Science, 2025, 52(7): 92-102. |
[3] | LUO Xuyang, TAN Zhiyi. Knowledge-aware Graph Refinement Network for Recommendation [J]. Computer Science, 2025, 52(7): 103-109. |
[4] | HAO Jiahui, WAN Yuan, ZHANG Yuhang. Research on Node Learning of Graph Neural Networks Fusing Positional and StructuralInformation [J]. Computer Science, 2025, 52(7): 110-118. |
[5] | LIU Chengzhuang, ZHAI Sulan, LIU Haiqing, WANG Kunpeng. Weakly-aligned RGBT Salient Object Detection Based on Multi-modal Feature Alignment [J]. Computer Science, 2025, 52(7): 142-150. |
[6] | ZHUANG Jianjun, WAN Li. SCF U2-Net:Lightweight U2-Net Improved Method for Breast Ultrasound Lesion SegmentationCombined with Fuzzy Logic [J]. Computer Science, 2025, 52(7): 161-169. |
[7] | JIANG Kun, ZHAO Zhengpeng, PU Yuanyuan, HUANG Jian, GU Jinjing, XU Dan. Cross-modal Hypergraph Optimisation Learning for Multimodal Sentiment Analysis [J]. Computer Science, 2025, 52(7): 210-217. |
[8] | ZHENG Cheng, YANG Nan. Aspect-based Sentiment Analysis Based on Syntax,Semantics and Affective Knowledge [J]. Computer Science, 2025, 52(7): 218-225. |
[9] | WANG Youkang, CHENG Chunling. Multimodal Sentiment Analysis Model Based on Cross-modal Unidirectional Weighting [J]. Computer Science, 2025, 52(7): 226-232. |
[10] | KONG Yinling, WANG Zhongqing, WANG Hongling. Study on Opinion Summarization Incorporating Evaluation Object Information [J]. Computer Science, 2025, 52(7): 233-240. |
[11] | LI Daicheng, LI Han, LIU Zheyu, GONG Shiheng. MacBERT Based Chinese Named Entity Recognition Fusion with Dependent Syntactic Information and Multi-view Lexical Information [J]. Computer Science, 2025, 52(6A): 240600121-8. |
[12] | HUANG Bocheng, WANG Xiaolong, AN Guocheng, ZHANG Tao. Transmission Line Fault Identification Method Based on Transfer Learning and Improved YOLOv8s [J]. Computer Science, 2025, 52(6A): 240800044-8. |
[13] | WU Zhihua, CHENG Jianghua, LIU Tong, CAI Yahui, CHENG Bang, PAN Lehao. Human Target Detection Algorithm for Low-quality Laser Through-window Imaging [J]. Computer Science, 2025, 52(6A): 240600069-6. |
[14] | ZHENG Chuangrui, DENG Xiuqin, CHEN Lei. Traffic Prediction Model Based on Decoupled Adaptive Dynamic Graph Convolution [J]. Computer Science, 2025, 52(6A): 240400149-8. |
[15] | HONG Yi, SHEN Shikai, SHE Yumei, YANG Bin, DAI Fei, WANG Jianxiao, ZHANG Liyi. Multivariate Time Series Prediction Based on Dynamic Graph Learning and Attention Mechanism [J]. Computer Science, 2025, 52(6A): 240700047-8. |
|