Computer Science ›› 2025, Vol. 52 ›› Issue (6A): 240400101-8.doi: 10.11896/jsjkx.240400101
• Artificial Intelligence • Previous Articles Next Articles
XU Yutao, TANG Shouguo
CLC Number:
[1]ANTOL S,AGRAWAL A,LU J,et al.VQA:Visual Question Answering[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:2425-2433. [2]MALINOWSKI M,ROHRBACH M,FRITZ M.Ask Your Neurons:A Neural-Based Approach to Answering Questions About Images[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1-9. [3]KIM J H,ON K W,LIM W,et al.Hadamard Product for Low-rank Bilinear Pooling[M/OL].arXiv,2017[2024-03-31].http://arxiv.org/abs/1610.04325. [4]LU J,YANG J,BATRA D,et al.Hierarchical Question-Image Co-Attention for Visual Question Answering[C]//Advances in Neural Information Processing Systems:卷 29.Curran Associates,Inc.,2016. [5]YU Z,YU J,CUI Y,et al.Deep Modular Co-Attention Net-works for Visual Question Answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:6281-6290. [6]VASWANI A,SHAZEER N,PARMAR N,et al.Attention Is All You Need[M/OL].arXiv,2017[2022-07-04].http://arxiv.org/abs/1706.03762. [7]NOH H,SEO P H,HAN B.Image Question Answering Using Convolutional Neural Network With Dynamic Parameter Prediction[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:30-38. [8]ADITYA S,YANG Y,BARAL C.Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering[J].Proceedings of the AAAI Conference on Artificial Intelligence,2018,32(1). [9]AUER S,BIZER C,KOBILAROV G,et al.DBpedia:A Nucleus for a Web of Open Data[C]//ABERER K,CHOI K S,NOY N,et al.The Semantic Web.Berlin,Heidelberg:Springer,2007:722-735. [10]WANG P,WU Q,SHEN C,et al.FVQA:Fact-Based Visual Question Answering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,40(10):2413-2427. [11]WU Q,WANG P,SHEN C,et al.Ask Me Anything:Free-Form Visual Question Answering Based on Knowledge From External Sources[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4622-4630. [12]SPEER R,CHIN J,HAVASI C.ConceptNet 5.5:An OpenMultilingual Graph of General Knowledge[J].Proceedings of the AAAI Conference on Artificial Intelligence,2017,31(1). [13]VRANDEČIĆ D,KRÖTZSCH M.Wikidata:a free collaborative knowledgebase[J].Communications of the ACM,2014,57(10):78-85. [14]SUCHANEK F M,KASNECI G,WEIKUM G.Yago:a core of semantic knowledge[C]//Proceedings of the 16th international conference on World Wide Web.New York,NY,USA:Association for Computing Machinery,2007:697-706. [15]WANG T,HUANG J,ZHANG H,et al.Visual Commonsense R-CNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:10760-10770. [16]ANDERSON P,HE X,BUEHLER C,et al.Bottom-Up andTop-Down Attention for Image Captioning and Visual Question Answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6077-6086. [17]KRISHNA R,ZHU Y,GROTH O,et al.Visual Genome:Connecting Language and Vision Using Crowdsourced Dense Image Annotations[J].International Journal of Computer Vision,2017,123(1):32-73. [18]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[C]//Advances in Neural Information Processing Systems:卷 28.Curran Associates,Inc.,2015. [19]TENEY D,ANDERSON P,HE X,et al.Tips and Tricks forVisual Question Answering:Learnings From the 2017 Challenge[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4223-4232. [20]PENNINGTON J,SOCHER R,MANNING C.GloVe:Global Vectors for Word Representation[C]//MOSCHITTI A,PANG B,DAELEMANS W.Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP).Doha,Qatar:Association for Computational Linguistics,2014:1532-1543. [21]KINGMA D P,BA J.Adam:A Method for Stochastic Optimization[M/OL].arXiv,2017[2024-03-31].http://arxiv.org/abs/1412.6980. [22]YU Z,YU J,XIANG C,et al.Beyond Bilinear:Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering[J].IEEE Transactions on Neural Networks and Learning Systems,2018,29(12):5947-5959. [23]BEN-YOUNES H,CADENE R,THOME N,et al.BLOCK:Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection[J].Proceedings of the AAAI Conference on Artificial Intelligence,2019,33(1):8102-8109. [24]NGUYEN D K,OKATANI T.Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6087-6096. [25]KIM J H,JUN J,ZHANG B T.Bilinear Attention Networks[C]//Advances in Neural Information Processing Systems:卷 31.Curran Associates,Inc.,2018. [26]GAO P,JIANG Z,YOU H,et al.Dynamic Fusion With Intra-and Inter-Modality Attention Flow for Visual Question Answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:6639-6648. [27]LIU Y,ZHANG X,ZHANG Q,et al.Dual self-attention with co-attention networks for visual question answering[J].Pattern Recognition,2021,117:107956. [28]KIM J J,LEE D G,WU J,et al.Visual question answering based on local-scene-aware referring expression generation[J].Neural Networks,2021,139:158-167. [29]SHUANG K,GUO J,WANG Z.Comprehensive-perception dynamic reasoning for visual question answering[J].Pattern Recognition,2022,131:108878. [30]GUO Z,HAN D.Sparse co-attention visual question answering networks based on thresholds[J].Applied Intelligence,2023,53(1):586-600. |
[1] | XU Yutao, TANG Shouguo. Visual Question Answering Integrating Visual Common Sense Features and Gated Counting Module [J]. Computer Science, 2025, 52(6A): 240800086-7. |
[2] | GU Huijie, FANG Wenchong, ZHOU Zhifeng, ZHU Wen, MA Guang, LI Yingchen. CSO-LSTM Based Power Prediction Method for New Energy Generation [J]. Computer Science, 2025, 52(6A): 240600053-11. |
[3] | HE Shiyang, WANG Zhaohui, GONG Shengrong, ZHONG Shan. Cross-modal Information Filtering-based Networks for Visual Question Answering [J]. Computer Science, 2024, 51(5): 85-91. |
[4] | CHEN Runhuan, DAI Hua, ZHENG Guineng, LI Hui , YANG Geng. Urban Electricity Load Forecasting Method Based on Discrepancy Compensation and Short-termSampling Contrastive Loss [J]. Computer Science, 2024, 51(4): 158-164. |
[5] | LI Xiang, FAN Zhiguang, LI Xuexiang, ZHANG Weixing, YANG Cong, CAO Yangjie. Survey of Visual Question Answering Based on Deep Learning [J]. Computer Science, 2023, 50(5): 177-188. |
[6] | ZOU Yunzhu, DU Shengdong, TENG Fei, LI Tianrui. Visual Question Answering Model Based on Multi-modal Deep Feature Fusion [J]. Computer Science, 2023, 50(2): 123-129. |
[7] | WANG Ruiping, WU Shihong, ZHANG Meihang, WANG Xiaoping. Knowledge-based Visual Question Answering:A Survey [J]. Computer Science, 2023, 50(1): 166-175. |
[8] | JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335. |
[9] | KANG Yan, WU Zhi-wei, KOU Yong-qi, ZHANG Lan, XIE Si-yu, LI Hao. Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution [J]. Computer Science, 2022, 49(6A): 150-158. |
[10] | YUAN De-sen, LIU Xiu-jing, WU Qing-bo, LI Hong-liang, MENG Fan-man, NGAN King-ngi, XU Lin-feng. Visual Question Answering Method Based on Counterfactual Thinking [J]. Computer Science, 2022, 49(12): 229-235. |
[11] | ZHU Guang-li, XU Xin, ZHANG Shun-xiang, WU Hou-yue, HUANG Ju. PosNet:Position-based Causal Relation Extraction Network [J]. Computer Science, 2022, 49(12): 305-311. |
[12] | NIU Yu-lei, ZHANG Han-wang. Survey on Visual Question Answering and Dialogue [J]. Computer Science, 2021, 48(3): 87-96. |
[13] | YU You-qin, LI Bi-cheng. Microblog User Interest Recognition Based on Multi-granularity Text Feature Representation [J]. Computer Science, 2021, 48(12): 219-225. |
[14] | CHEN Jin-yin, JIANG Tao and ZHENG Hai-bin. Radio Modulation Recognition Based on Signal-noise Ratio Classification [J]. Computer Science, 2020, 47(6A): 310-317. |
[15] | ZHAO Cheng, YE Yao-wei, YAO Ming-hai. Stock Volatility Forecast Based on Financial Text Emotion [J]. Computer Science, 2020, 47(5): 79-83. |
|