Computer Science ›› 2025, Vol. 52 ›› Issue (6A): 240800086-7.doi: 10.11896/jsjkx.240800086
• Image Processing & Multimedia Technology • Previous Articles Next Articles
XU Yutao, TANG Shouguo
CLC Number:
[1]ANTOL S,AGRAWAL A,LU J,et al.VQA:Visual Question Answering[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:2425-2433. [2]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet Classification with Deep Convolutional Neural Networks[J].Communications of the ACM,2017,60(6):84-90. [3]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:TowardsReal-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(6):1137-1149. [4]HOCHREITER S,SCHMIDHUBER J.Long Short-Term Memory[J].Neural Computation,1997,9(8):1735-1780. [5]VASWANI A,SHAZEER N,PARMAR N,et al.Attention Is All You Need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:6000-6010. [6]WANG Z,JI S.Learning Convolutional Text Representationsfor Visual Question Answering[C]//Proceedings of the 2018 SIAM International Conference on Data Mining.2018:594-602. [7]LU J,YANG J,BATRA D,et al.Hierarchical Question-Image Co-Attention for Visual Question Answering[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.2016:289-297. [8]YU Z,YU J,CUI Y,et al.Deep Modular Co-Attention Net-works for Visual Question Answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:6281-6290. [9]HE K,ZHANG X,REN S,et al.Deep Residual Learning for Image Recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [10]HE K,GKIOXARI G,DOLLAR P,et al.Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2961-2969. [11]LONG J,SHELHAMER E,DARRELL T.Fully Convolutional Networks for Semantic Segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440. [12]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultiBox Detector[C]//Computer Vision-ECCV 2016.Cham:Springer International Publishing,2016:21-37. [13]LI B,WU W,WANG Q,et al.SiamRPN++:Evolution of Siamese Visual Tracking With Very Deep Networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4282-4291. [14]KRISTAN M,MATAS J,LEONARDIS A,et al.The VisualObject Tracking VOT2015 Challenge Results[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops.2015:1-23. [15]WANG T,HUANG J,ZHANG H,et al.Visual Commonsense R-CNN[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:10760-10770. [16]AGRAWAL A,BATRA D,PARIKH D,et al.Don't Just Assume; Look and Answer:Overcoming Priors for Visual Question Answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4971-4980. [17]HUDSON D A,MANNING C D.GQA:A New Dataset for Real-World Visual Reasoning and Compositional Question Answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:6700-6709. [18]HENDRICKS L A,BURNS K,SAENKO K,et al.Women also Snowboard:Overcoming Bias in Captioning Models[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:771-787. [19]MANJUNATHA V,SAINI N,DAVIS L S.Explicit Bias Discovery in Visual Question Answering Models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:9562-9571. [20]RAMAKRISHNAN S,AGRAWAL A,LEE S.OvercomingLanguage Priors in Visual Question Answering with Adversarial Regularization[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.2018:1548-1558. [21]SADEGHI F,KUMAR DIVVALA S K,FARHADI A.VisKE:Visual Knowledge Extraction and Question Answering by Visual Verification of Relation Phrases[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1456-1464. [22]SU Z,ZHU C,DONG Y,et al.Learning Visual KnowledgeMemory Networks for Visual Question Answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7736-7745. [23]GOYAL R,EBRAHIMI KAHOU S,MICHALSKI V,et al.The “Something Something” Video Database for Learning and Eva-luating Visual Common Sense[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:5842-5850. [24]LEMPITSKY V,ZISSERMAN A.Learning To Count Objectsin Images[C]//Proceedings of the 24th International Conference on Neural Information Processing Systems.2010:1324-1332. [25]XIONG H,LU H,LIU C,et al.From Open Set to Closed Set:Counting Objects by Spatial Divide-and-Conquer[C]//Procee-dings of the IEEE/CVF International Conference on Computer Vision.2019:8362-8371. [26]HUBERMAN-SPIEGELGLAS I,FATTAL R.Single ImageObject Counting and Localizing Using Active-Learning[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2022:1310-1319. [27]ZHANG Y,HARE J,PRÜGEL-BENNETT A.Learning toCount Objects in Natural Images for Visual Question Answering[J].arXiv:1802.05766,2018. [28]ACHARYA M,KAFLE K,KANAN C.TallyQA:AnsweringComplex Counting Questions[J].Proceedings of the AAAI Conference on Artificial Intelligence,2019,33(1):8076-8084. [29]TROTT A,XIONG C,SOCHER R.Interpretable Counting for Visual Question Answering[J].arXiv:1712.08697,2018. [30]WHITEHEAD S,WU H,JI H,et al.Separating Skills and Concepts for Novel Visual Question Answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:5632-5641. [31]ANDERSON P,HE X,BUEHLER C,et al.Bottom-Up andTop-Down Attention for Image Captioning and Visual Question Answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6077-6086. [32]PENNINGTON J,SOCHER R,MANNING C.GloVe:Global Vectors for Word Representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).Doha,Qatar:Association for Computational Linguistics,2014:1532-1543. [33]KRISHNA R,ZHU Y,GROTH O,et al.Visual Genome:Connecting Language and Vision Using Crowdsourced Dense Image Annotations[J].International Journal of Computer Vision,2017,123(1):32-73. [34]KINGMA D P,BA J.Adam:A Method for Stochastic Optimization[J].arXiv:1412.6980,2017. [35]YU Z,YU J,XIANG C,et al.Beyond Bilinear:Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering[J].IEEE Transactions on Neural Networks and Learning Systems,2018,29(12):5947-5959. [36]BEN-YOUNES H,CADENE R,THOME N,et al.BLOCK:Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:8102-8109. [37]NGUYEN D K,OKATANI T.Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6087-6096. [38]KIM J H,JUN J,ZHANG B T.Bilinear Attention Networks[C]//NeurIPS 2018.Montréal,Canada,2018:1-11. [39]GAO P,JIANG Z,YOU H,et al.Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering[C]//Proceedings of the IEEE/CVF Conference on ComputerVision and Pattern Recognition.2019:6639-6648. [40]KIM J J,LEE D G,WU J,et al.Visual question answering based on local-scene-aware referring expression generation[J].Neural Networks,2021,139:158-167. [41]SHUANG K,GUO J,WANG Z.Comprehensive-perception dynamic reasoning for visual question answering[J].Pattern Re-cognition,2022,131:108878. [42]CHEN C,HAN D,CHANG C C.CAAN:Context-Aware attention network for visual question answering[J].Pattern Recognition,2022,132:108980. [43]GUO Z,HAN D.Sparse co-attention visual question answering networks based on thresholds[J].Applied Intelligence,2023,53(1):586-600. [44]HU T,HE L L.Joint relational reasoning visual question answering model based on gating mechanism[J].Intelligent Computer and Applications,2023,13(12):138-143. [45]ZHANG J,LIU X,WANG Z.Latent Attention Network WithPosition Perception for Visual Question Answering[J].IEEE Transactions on Neural Networks and Learning Systems,2024,36(3):5059-5069. |
[1] | WANG Rong , ZOU Shuping, HAO Pengfei, GUO Jiawei, SHU Peng. Sand Dust Image Enhancement Method Based on Multi-cascaded Attention Interaction [J]. Computer Science, 2025, 52(6A): 240800048-7. |
[2] | JIN Lu, LIU Mingkun, ZHANG Chunhong, CHEN Kefei, LUO Yaqiong, LI Bo. Pedestrian Re-identification Based on Spatial Transformation and Multi-scale Feature Fusion [J]. Computer Science, 2025, 52(6A): 240800156-7. |
[3] | ZHANG Yongyu, GUO Chenjuan, WEI Hanyue. Deep Learning Stock Price Probability Prediction Based on Multi-modal Feature Wavelet Decomposition [J]. Computer Science, 2025, 52(6A): 240600140-11. |
[4] | SHI Xincheng, WANG Baohui, YU Litao, DU Hui. Study on Segmentation Algorithm of Lower Limb Bone Anatomical Structure Based on 3D CTImages [J]. Computer Science, 2025, 52(6A): 240500119-9. |
[5] | LI Weirong, YIN Jibin. FB-TimesNet:An Improved Multimodal Emotion Recognition Method Based on TimesNet [J]. Computer Science, 2025, 52(6A): 240900046-8. |
[6] | XU Yutao, TANG Shouguo. External Knowledge Query-based for Visual Question Answering [J]. Computer Science, 2025, 52(6A): 240400101-8. |
[7] | WANG Rui, TANG Zhanjun. Multi-feature Fusion and Ensemble Learning-based Wind Turbine Blade Defect Detection Method [J]. Computer Science, 2025, 52(6A): 240900138-8. |
[8] | LI Mingjie, HU Yi, YI Zhengming. Flame Image Enhancement with Few Samples Based on Style Weight Modulation Technique [J]. Computer Science, 2025, 52(6A): 240500129-7. |
[9] | SHEN Xinyang, WANG Shanmin, SUN Yubao. Depression Recognition Based on Speech Corpus Alignment and Adaptive Fusion [J]. Computer Science, 2025, 52(6): 219-227. |
[10] | GUO Yecai, HU Xiaowei, MAO Xiangnan. Multi-scale Feature Fusion Residual Denoising Network Based on Cascade [J]. Computer Science, 2025, 52(6): 239-246. |
[11] | GENG Sheng, DING Weiping, JU Hengrong, HUANG Jiashuang, JIANG Shu, WANG Haipeng. FDiff-Fusion:Medical Image Diffusion Fusion Network Segmentation Model Driven Based onFuzzy Logic [J]. Computer Science, 2025, 52(6): 274-285. |
[12] | JIANG Wenwen, XIA Ying. Improved U-Net Multi-scale Feature Fusion Semantic Segmentation Network for RemoteSensing Images [J]. Computer Science, 2025, 52(5): 212-219. |
[13] | LI Xiwang, CAO Peisong, WU Yuying, GUO Shuming, SHE Wei. Study on Security Risk Relation Extraction Based on Multi-view IB [J]. Computer Science, 2025, 52(5): 330-336. |
[14] | DENG Ceyu, LI Duantengchuan, HU Yiren, WANG Xiaoguang, LI Zhifei. Joint Inter-word and Inter-sentence Multi-relationship Modeling for Review-basedRecommendation Algorithm [J]. Computer Science, 2025, 52(4): 119-128. |
[15] | YANG Jincai, YU Moyang, HU Man, XIAO Ming. Automatic Identification and Classification of Topical Discourse Markers [J]. Computer Science, 2025, 52(4): 255-261. |
|