Computer Science ›› 2024, Vol. 51 ›› Issue (5): 85-91.doi: 10.11896/jsjkx.230300202
• Computer Graphics & Multimedia • Previous Articles Next Articles
HE Shiyang1, WANG Zhaohui2, GONG Shengrong1,3, ZHONG Shan3
CLC Number:
[1]YAN F,MIKOLAJCZYK K.Deep correlation for matching images and text[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:3441-3450. [2]WANG Y,YANG H,QIAN X,et al.Position focused attentionnetwork for image-text matching[C]//Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence.San Francisco:Morgan Kaufmann,2019:3792-3798. [3]YOU Q,JIN H,WANG Z,et al.Image captioning with semantic attention[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.New York:IEEE Press,2016:4651-4659. [4]LI G,ZHU L,LIU P,et al.Entangled transf-ormer for imagecaptioning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.New York:IEEE Press,2019:8928-8937. [5]NGUYEN K,TRIPATHI S,DU B,et al.In defense of scenegraphs for image captioning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.New York:IEEE Press,2021:1407-1416. [6]ANTOL S,AGRAWAL A,LU J,et al.Vqa:Visual question answering[C]//Proceedings of the IEEE International Conference on Computer Vision.New York:IEEE Press,2015:2425-2433. [7]ANDERSON P,HE X,BUEHLER C,et al.Bottom-up and top-down attention for image captioning and visual question answe-ring[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2018:6077-6086. [8]YU Z,YU J,CUI Y,et al.Deep modular co-attention networks for visual question answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2019:6281-6290. [9]MALINOWSKI M,ROHRBACH M,FRITZ M.Ask your neurons:A neural-based approach to answering questions about images[C]//Proceedings of the IEEE International Conference on Computer Vision.New York:IEEE Press,2015:1-9. [10]SHIH K J,SINGH S,HOIEM D.Where to look:Focus regions for visual question answering [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2016:4613-4621. [11]REN S,HE K,GIRSHICK R,et al.Faster rcnn:Towards real-time object detection with region proposal networks[C]//Advances in NeuralInformation Processing Systems 28.Cambridge:MIT Press,2015:91-99. [12]KIM J H,JUN J,ZHANG B T.Bilinear attention networks[C]//Advances in Neural Information Processing Systems 31.Cambridge:MIT Press,2018:1571-1581. [13]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems 30.Cambridge:MIT Press,2017:5998-6008. [14]YANG Z,HE X,GAO J,et al.Stacked attention networks for image question answering[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.New York:IEEE Press,2016:21-29. [15]LU P,LI H,ZHANG W,et al.Co-attending freeform regionsand detections with multi-modal multiplicative feature embedding for visual question answering[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Menlo Park:AAAI Press,2018:7218-7225. [16]YU Z,YU J,FAN J,et al.Multi-modal factorized bilinear pooling with co-attention learning for visual question answering[C]//Proceedings of the IEEE International Conference on Computer Vision.New York:IEEE Press,2017:1839-1848. [17]ZHOU B,TIAN Y,SUKHBAATAR S,et al.Simple baseline for visual question answering[J].arXiv,2015,1512.02167. [18]SCHWARTZ I,SCHWING A,HAZAN T.High-orderattention models for visual question answering[C]//Advances in Neural Information Processing Systems 30.Cambridge:MIT Press,2017:3664-3674. [19]BENYOUNES H,CADENE R,CORD M,et al.Mutan:Multimodal tucker fusion for visual question answering[C]//Procee-dings of the IEEE International Conference on Computer Vision.2017:2612-2620. [20]NAM H,HA J W,KIM J.Dual attention networks for multimodal reasoning and matching[C]//Proceedings of the IEEE Conferenceon Computer Vision and Pattern recognition.New York:IEEE Press,2017:299-307. [21]NGUYEN D K,OKATANI T.Improved fusion ofvisual andlanguage representations by densesymmetric coattention for vi-sual question answering[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.New York:IEEE Press,2018:6087-6096. [22]FU P C,YANG G,LIU X M,et al.Visual Question Answering Model Based on Spatial Relation and Frequency Feature[J].Computer Engineering,2022,48(9):96-104. [23]PENG L,YANG Y,BIN Y,et al.Word-to-region attention network for visual question answering[J].Multimedia Tools and Applications,2019,78:3843-3858. [24]GUAN W,WU Z,PING W.Question-oriented cross-modal co-attention networks for visual question answering[C]//2022 2nd International Conference on Consumer Electronics and Compu-ter Engineering.New York:IEEE Press,2022:401-407. [25]HOCHREITER S,SCHMIDHUBER J.Long short term memory[J].Neural Computation,1997,9(8):1735-1780. [26]LI C,LI L,QI J.A self-attentive model with gate mechanism for spoken language understanding[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Proces-sing.Stroudsburg:ACL,2018:3824-3833. [27]RAHMAN T,CHOU S H,SIGAL L,et al.An improved attention for visual question answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2021:1653-1662. [28]KRISHNA R,ZHU Y,GROTH O,et al.Visual g-enome:Connecting language and vision using crowd sourced dense image annotations[J].International Journal of Computer Vision,2017,123:32-73. [29]RUSSAKOVSKY O,DENG J,SU H,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115:211-252. [30]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2016:770-778. [31]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing.Stroudsburg:ACL,2014:1532-1543. [32]NGUYEN B X,DO T,TRAN H,et al.Coarse-to-fine reasoning for visual question answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2022:4558-4566. [33]GOYAL Y,KHOT T,SUMMERS-STAY D,et al.Making the v in vqa matter:Elevating the role of image understanding in vi-sual question answering[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:6904-6913. [34]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//Computer Vision ECCV 2014:13th European Conference.Berlin:Springer,2014:740-755. [35]KINGMA D P,BA J.Adam:A method for stochasticoptimiza-tion[C]//3rd International Conference on Learning Representations.Ithaca,2015. [36]KIM W,SON B,KIM I.Vilt:Vision and language transformer without convolution or region supervision[C]//International Conference on Machine Learning.New York:ACM,2021:5583-5594. [37]QIAN Y,HU Y,WANG R,et al.Question Driven Graph Fusion Network For Visual Question Answering[C]//2022 IEEE International Conference on Multimedia and Expo.New York:IEEE Press,2022:1-6. |
[1] | BAO Kainan, ZHANG Junbo, SONG Li, LI Tianrui. ST-WaveMLP:Spatio-Temporal Global-aware Network for Traffic Flow Prediction [J]. Computer Science, 2024, 51(5): 27-34. |
[2] | ZHANG Jianliang, LI Yang, ZHU Qingshan, XUE Hongling, MA Junwei, ZHANG Lixia, BI Sheng. Substation Equipment Malfunction Alarm Algorithm Based on Dual-domain Sparse Transformer [J]. Computer Science, 2024, 51(5): 62-69. |
[3] | SONG Jianfeng, ZHANG Wenying, HAN Lu, HU Guozheng, MIAO Qiguang. Multi-stage Intelligent Color Restoration Algorithm for Black-and-White Movies [J]. Computer Science, 2024, 51(5): 92-99. |
[4] | SHAN Xinxin, LI Kai, WEN Ying. Medical Image Segmentation Network Integrating Full-scale Feature Fusion and RNN with Attention [J]. Computer Science, 2024, 51(5): 100-107. |
[5] | ZHOU Yu, CHEN Zhihua, SHENG Bin, LIANG Lei. Multi Scale Progressive Transformer for Image Dehazing [J]. Computer Science, 2024, 51(5): 117-124. |
[6] | BAI Xuefei, SHEN Wucheng, WANG Wenjian. Salient Object Detection Based on Feature Attention Purification [J]. Computer Science, 2024, 51(5): 125-133. |
[7] | HE Xiaohui, ZHOU Tao, LI Panle, CHANG Jing, LI Jiamian. Study on Building Extraction from Remote Sensing Image Based on Multi-scale Attention [J]. Computer Science, 2024, 51(5): 134-142. |
[8] | XU Xuejie, WANG Baohui. Multi-label Patent Classification Based on Text and Historical Data [J]. Computer Science, 2024, 51(5): 172-178. |
[9] | LAN Yongqi, HE Xingxing, LI Yingfang, LI Tianrui. New Graph Reduction Representation and Graph Neural Network Model for Premise Selection [J]. Computer Science, 2024, 51(5): 193-199. |
[10] | LI Zichen, YI Xiuwen, CHEN Shun, ZHANG Junbo, LI Tianrui. Government Event Dispatch Approach Based on Deep Multi-view Network [J]. Computer Science, 2024, 51(5): 216-222. |
[11] | HONG Tijing, LIU Dengfeng, LIU Yian. Radar Active Jamming Recognition Based on Multiscale Fully Convolutional Neural Network and GRU [J]. Computer Science, 2024, 51(5): 306-312. |
[12] | SUN Jing, WANG Xiaoxia. Convolutional Neural Network Model Compression Method Based on Cloud Edge Collaborative Subclass Distillation [J]. Computer Science, 2024, 51(5): 313-320. |
[13] | CHEN Runhuan, DAI Hua, ZHENG Guineng, LI Hui , YANG Geng. Urban Electricity Load Forecasting Method Based on Discrepancy Compensation and Short-termSampling Contrastive Loss [J]. Computer Science, 2024, 51(4): 158-164. |
[14] | LIN Binwei, YU Zhiyong, HUANG Fangwan, GUO Xianwei. Data Completion and Prediction of Street Parking Spaces Based on Transformer [J]. Computer Science, 2024, 51(4): 165-173. |
[15] | WANG Ruiping, WU Shihong, ZHANG Meihang, WANG Xiaoping. Review of Vision-based Neural Network 3D Dynamic Gesture Recognition Methods [J]. Computer Science, 2024, 51(4): 193-208. |
|