Computer Science ›› 2025, Vol. 52 ›› Issue (12): 252-259.doi: 10.11896/jsjkx.241000105
• Artificial Intelligence • Previous Articles Next Articles
YAN Yujing1, HOU Xia1, GUO Yuting2, ZHANG Mingliang1, SONG Wenfeng1
CLC Number:
| [1]ANTOL S,AGRAWAL A,LU J S,et al.VQA:Visual Question Answering[C]//2015 IEEE International Conference on Computer Vision.2015:2425-2433. [2]ISHMAM M F,SHOVON M S H,MRIDHA M F,et al.From Image to Language:A Critical Analysis of Visual Question Answering(VQA) Approaches,Challenges,and Opportunities[J].Information Fusion,2024,106:102270. [3]LIN Z,ZHANG D,TAO Q,et al.Medical Visual Question Answering:A Survey[J].Artificial Intelligence in Medicine,2023,143:102611. [4]SCHMIDHUBER J,HOCHREITER S.Long Short-Term Me-mory[J].Neural Computation,1997,9(8):1735-1780. [5]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll You Need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:6000-6010. [6]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:4171-4186. [7]CHEN G,GONG H,AND LI G.HCP-MIC at VQA-Med 2020:Effective Visual Representation for Medical Visual Question Answering[C]//CLEF(Working Notes).2020. [8]NGUYEN B D ,DO T T,NGUYEN B X,et al.Overcoming Data Limitation in Medical Visual Question Answering[C]//International Conference on Medical Image Computing and Compu-ter-Assisted Intervention.2019:522-530. [9]LIU B,ZHAN L M,WU X M.Contrastive pre-training and representation distillation for medical visual question answering based on radiology images[C]//Medical Image Computing and Computer Assisted Intervention.2021:210-220. [10]LIU L,SU X.How well apply multimodal mixup and simplemlps backbone to medical visual question answering?[C]//International Conference on Bioinformatics and Biomedicine.2022:2648-2655. [11]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[C]//Proceedings of the 3rd International Conference on Learning Representations.2015. [12]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [13]YANG Z,HE X,GAO J,et al.Stacked attention networks forimage question answering[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2016:21-29. [14]KIM J H,JUN J,ZHANG B T.Bilinear attention networks[C]//Conference on Neural Information Processing Systems.2018. [15]BEN ABACHA A,HASAN S A,DATLA V V,et al.VQA-Med:Overview of the medical visual question answering task at image CLEF 2019[C]//Proceedings of CLEF 2019 Working Notes.2019:9-12. [16]ESLAMI S,DE MELO G,MEINEL C.Does clip benefit visual question answering in the medical domain as much as it does in the general domain?[J].arXiv:2112.13906,2021. [17]ALLAOUZI I,AHMED M B,BENAMROU B.An Encoder-Decoder Model for Visual Question Answering in the Medical Domain[C]//CLEF.2019. [18]KHARE Y,BAGAL V,MATHEW M,et al.Mmbert:Multimodal bert pretraining for improved medical vqa[C]//International Symposium on Biomedical Imaging.2021:1033-1036. [19]WANG X,PENG Y,LU L,et al.ChestX-Ray8:Hospital-scalechest X-Ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases[C]//IEEE Conference on Computer Vision and Pattern Recognition.2017:3462-3471. [20]PELKA O,KOITKA S,RÜCKERT J,et al.Radiology objects in Context(ROCO):a multimodal image dataset[C]//International Conference on Medical Imaging Computing and Computer-Assisted Intervention.2018:180-189. [21]GONG H,CHEN G,MAO M,et al.VQAMIX:Conditional triplet mixup for medical visual question answering[J].IEEE Transactions on Medical Imaging,2022,41(11):3332-3343. [22]PENNINGTON J,SOCHER R,MANNING C D.GloVe:Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing.2014:1532-1543. [23]ZHANG H,CISSE M,DAUPHIN Y N,et al.mixup:Beyondempirical risk minimization[J].arXiv:1710.09412,2017. [24]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141. [25]LIU B,ZHAN L M,XU L,et al.SLAKE:A Semantically-La-beled Knowledge-Enhanced Dataset for Medical Visual Question Answering[J].arXiv:2102.09542,2021. [26]LAU J J,GAYEN S,ABACHA A B,et al.A dataset of clinically generated visual questions and answers about radiology images[J].Scientific Data,2018,5(1):1-10. [27]LOSHCHILOV I,HUTTER F.Fixing weight decay regularization in adam[J].arXiv:1711.05101,2017. [28]ESΛI S,MEINEL C,DE MELO G.Pubmedclip:How much does clip benefit visual question answering in the medical domain?[C]//Findings of the Association for Computational Linguistics.2023:1181-1193. [29]CHEN J,YANG D,JIANG Y,et al.MISS:A Generative Pre-training and Fine-Tuning Approach for Med-VQA[C]//International Conference on Artificial Neural Networks.2024:299-313. |
| [1] | LIU Wei, XU Yong, FANG Juan, LI Cheng, ZHU Yujun, FANG Qun, HE Xin. Multimodal Air-writing Gesture Recognition Based on Radar-Vision Fusion [J]. Computer Science, 2025, 52(9): 259-268. |
| [2] | WANG Yuanlong, ZHANG Ningqian, ZHANG Hu. Visual Storytelling Based on Planning Learning [J]. Computer Science, 2025, 52(9): 269-275. |
| [3] | SU Zhiyuan, ZHAO Lixu, HAO Zhiheng, BAI Rufeng. Suvery of Artificial Intelligence Ensuring eVTOL Flight Safety in the Context of Low-altitudeEconomy [J]. Computer Science, 2025, 52(6A): 250200050-13. |
| [4] | XU Yutao, TANG Shouguo. External Knowledge Query-based for Visual Question Answering [J]. Computer Science, 2025, 52(6A): 240400101-8. |
| [5] | GAO Junyi, ZHANG Wei, LI Zelin. YOLO-BFEPS:Efficient Attention-enhanced Cross-scale YOLOv10 Fire Detection Model [J]. Computer Science, 2025, 52(6A): 240800134-9. |
| [6] | XU Yutao, TANG Shouguo. Visual Question Answering Integrating Visual Common Sense Features and Gated Counting Module [J]. Computer Science, 2025, 52(6A): 240800086-7. |
| [7] | LI Xiaolan, MA Yong. Study on Lightweight Flame Detection Algorithm with Progressive Adaptive Feature Fusion [J]. Computer Science, 2025, 52(4): 64-73. |
| [8] | CAO Wenbo, WEI Mingyang, DUAN Xiaoyong, LIU Xueyuan. Three-dimensional Object Detection Algorithm of Road Scene Based on Attention Mechanism [J]. Computer Science, 2025, 52(11A): 241100112-7. |
| [9] | ZHANG Xiaorui, XU Yanan, SUN Wei. CINN:A High-speed and JPEG-resistant Medical Image Watermarking Network [J]. Computer Science, 2025, 52(11A): 241100037-7. |
| [10] | LI Yujie, MA Zihang, WANG Yifu, WANG Xinghe, TAN Benying. Survey of Vision Transformers(ViT) [J]. Computer Science, 2025, 52(1): 194-209. |
| [11] | ZHANG Jian, LI Hui, ZHANG Shengming, WU Jie, PENG Ying. Review of Pre-training Methods for Visually-rich Document Understanding [J]. Computer Science, 2025, 52(1): 259-276. |
| [12] | ZHU Fukun, TENG Zhen, SHAO Wenze, GE Qi, SUN Yubao. Semantic-guided Neural Network Critical Data Routing Path [J]. Computer Science, 2024, 51(9): 155-161. |
| [13] | CAI Wenliang, HUANG Jun. Lane Detection Method Based on RepVGG [J]. Computer Science, 2024, 51(7): 236-243. |
| [14] | HUANG Haixin, CAI Mingqi, WANG Yuyao. Review of Point Cloud Semantic Segmentation Based on Graph Convolutional Neural Networks [J]. Computer Science, 2024, 51(6A): 230400196-7. |
| [15] | LU Dongsheng, LONG Hua. Method for Homologous Spectrum Monitoring Data Identification Based on Spectrum SIFT [J]. Computer Science, 2024, 51(6A): 230300177-7. |
|
||