计算机科学 ›› 2021, Vol. 48 ›› Issue (3): 87-96.doi: 10.11896/jsjkx.201200174

所属专题: 多媒体技术进展

• 多媒体技术进展* 上一篇    下一篇

视觉问答与对话综述

牛玉磊, 张含望   

  1. 南洋理工大学计算机科学与工程学院 新加坡639798
  • 收稿日期:2020-12-20 修回日期:2021-01-28 出版日期:2021-03-15 发布日期:2021-03-05
  • 通讯作者: 牛玉磊(yn.yuleiniu@gmail.com)
  • 基金资助:
    阿里巴巴-南洋理工大学新加坡联合研究所

Survey on Visual Question Answering and Dialogue

NIU Yu-lei, ZHANG Han-wang   

  1. School of Computer Science and Engineering,Nanyang Technological University,639798,Singapore
  • Received:2020-12-20 Revised:2021-01-28 Online:2021-03-15 Published:2021-03-05
  • About author:NIU Yu-lei,born in 1992,Ph.D.His main research interests include vision-language and visual reasoning.
  • Supported by:
    NTU-Alibaba JRI.

摘要: 视觉问答与对话是人工智能领域的重要研究任务,是计算机视觉与自然语言处理交叉领域的代表性问题之一。视觉问答与对话任务要求机器根据指定的视觉图像内容,对单轮或多轮的自然语言问题进行作答。视觉问答与对话对机器的感知能力、认知能力和推理能力均提出了较高的要求,在跨模态人机交互应用中具有实用前景。文中对近年来视觉问答与对话的研究进展进行了综述,对数据集和算法进行了归纳,对研究挑战和问题进行了总结,最后对视觉问答与对话的未来发展趋势进行了讨论。

关键词: 深度学习, 视觉对话, 视觉推理, 视觉问答, 视觉语言

Abstract: Visual question answering and dialogue are important research tasks in artificial intelligence,and the representative problems in the intersection of computer vision and natural language processing.Visual question answering and dialogue tasks require the machine to answer single-round or multi-round questions based on the specified visual content.Visual question answering and dialogue require the machine’s abilities of perception,cognition and reasoning,and have application prospects in cross-modal human-computer interaction applications.This paper reviews recent research progress of visual question answering and dialogue,and summarizes datasets,algorithms,challenges,and problems.Finally,this paper discusses the future research trend of visual question answering and dialogue.

Key words: Deep learning, Vision and language, Visual dialogue, Visual question answering, Visual reasoning

中图分类号: 

  • TP391
[1]YU J,WANG L,YU Z.Research on Visual Question Answering Techniques[J].Journal of Computer Research and Development,2018,55(9):1946-1958.
[2]QI J,NIU Y,HUANG J,et al.Two causal principles for improving visual dialog[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2020:10860-10869.
[3]ANTOL S,AGRAWAL A,LU J,et al.Vqa:Visual question answering[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:2425-2433.
[4]GOYAL Y,KHOT T,SUMMERS-STAY D,et al.Making the V in VQA matter:Elevating the role of image understanding in Visual Question Answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6904-6913.
[5]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European Conference on Computer Vision.Springer,Cham,2014:740-755.
[6]DAS A,KOTTUR S,GUPTA K,et al.Visual dialog[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:326-335.
[7]MALINOWSKI M,FRITZ M.A Multi-world Approach toQuestion Answering about Real-world Scenes based on Uncertain Input[C]//Twenty-Eighth Annual Conference on Neural Information Processing Systems.Curran,2014:1682-1690.
[8]REN M,KIROS R,ZEMEL R S.Exploring models and data for image question answering[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems.2015:2953-2961.
[9]GAO H,MAO J,ZHOU J,et al.Are you talking to a machine? Dataset and methods for multilingual image question answering[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems.2015:2296-2304.
[10]KRISHNA R,ZHU Y,GROTH O,et al.Visual genome:Connecting language and vision using crowdsourced dense image annotations[J].International Journal of Computer Vision,2017,123(1):32-73.
[11]ZHU Y,GROTH O,BERNSTEIN M,et al.Visual7w:Groun-ded question answering in images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4995-5004.
[12]YU L,PARK E,BERG A C,et al.Visual madlibs:Fill in the blank image generation and question answering[J].arXiv:1506.00278,2015.
[13]DE VRIES H,STRUB F,CHANDAR S,et al.Guesswhat?!visual object discovery through multi-modal dialogue[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5503-5512.
[14]XU H,SAENKO K.Ask,attend and answer:Exploring question-guided spatial attention for visual question answering[C]//European Conference on Computer Vision.Springer,Cham,2016:451-466.
[15]YANG Z,HE X,GAO J,et al.Stacked attention networks for image question answering[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2016:21-29.
[16]LU J,YANG J,BATRA D,et al.Hierarchical question-imageco-attention for visual question answering[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.2016:289-297.
[17]YU Z,YU J,CUI Y,et al.Deep modular co-attention networks for visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6281-6290.
[18]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:6000-6010.
[19]KIM J H,JUN J,ZHANG B T.Bilinear attention networks[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.2018:1571-1581.
[20]FUKUI A,PARK D H,YANG D,et al.Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.2016:457-468.
[21]KIM J H,ON K W,LIM W,et al.Hadamard product for low-rank bilinear pooling[J].arXiv:1610.04325,2016.
[22]YU Z,YU J,FAN J,et al.Multi-modal factorized bilinear pooling with co-attention learning for visual question answering[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:1821-1830.
[23]BEN-YOUNES H,CADENE R,CORD M,et al.Mutan:Multimodal tucker fusion for visual question answering[C]//Procee-dings of the IEEE International Conference on Computer Vision.2017:2612-2620.
[24]BEN-YOUNES H,CADENE R,THOME N,et al.Block:Bili-near superdiagonal fusion for vi-sual question answering and vi-sual relationship detection[C]//Proceedings of the AAAI Confe-rence on Artificial Intelligence.2019,33:8102-8109.
[25]ANDREAS J,ROHRBACH M,DARRELL T,et al.Neural module networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:39-48.
[26]HU R,ANDREAS J,ROHRBACH M,et al.Learning to rea-son:End-to-end module networks for visual question answering[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:804-813.
[27]HU R,ANDREAS J,DARRELL T,et al.Explainable neuralcomputation via stack neural module networks[C]//Procee-dings of the European Conference on Computer Vision (ECCV).2018:53-69.
[28]JOHNSON J,HARIHARAN B,VAN DER MAATEN L,et al.Inferring and executing programs for visual reasoning[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2989-2998.
[29]MASCHARKA D,TRAN P,SOKLASKI R,et al.Transparency by design:Closing the gap between performance and interpretability in visual reasoning[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2018:4942-4950.
[30]YI K,WU J,GAN C,et al.Neural-symbolic VQA:disentangling reasoning from vision and language understanding[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.2018:1039-1050.
[31]SHI J,ZHANG H,LI J.Explainable and explicit visual reaso-ning over scene graphs[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2019:8376-8384.
[32]VEDANTAM R,DESAI K,LEE S,et al.Probabilistic Neural Symbolic Models for Interpretable Visual Question Answering[C]//International Conference on Machine Learning.2019:6428-6437.
[33]CHEN W,GAN Z,LI L,et al.Meta module network for compositional visual reasoning[J].arXiv:1910.03230,2019.
[34]JOHNSON J,HARIHARAN B,VAN DER MAATEN L,et al.Clevr:A diagnostic dataset for compositional language and elementary visual reasoning[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:2901-2910.
[35]CADENE R,BEN-YOUNES H,CORD M,et al.Murel:Multimodal relational reasoning for visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:1989-1998.
[36]LI L,GAN Z,CHENG Y,et al.Relation-aware graph attention network for visual question answering[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:10313-10322.
[37]HU R,ROHRBACH A,DARRELL T,et al.Language-conditioned graph networks for relational reasoning[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:10294-10303.
[38]KAFLE K,YOUSEFHUSSIEN M,KANAN C.Data augmentation for visual question answering[C]//Proceedings of the 10th International Conference on Natural Language Generation.2017:198-202.
[39]RAY A,SIKKA K,DIVAKARAN A,et al.Sunny and DarkOutside?! Improving Answer Consistency in VQA through Entailed Question Generation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).2019:5863-5868.
[40]SHAH M,CHEN X,ROHRBACH M,et al.Cycle-consistencyfor robust visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6649-6658.
[41]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[42]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[43]DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:248-255.
[44]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).2014:1532-1543.
[45]ANDERSON P,HE X,BUEHLER C,et al.Bottom-up and top-down attention for image captioning and visual question answe-ring[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6077-6086.
[46]JIANG H,MISRA I,ROHRBACH M,et al.In Defense of Grid Features for Visual Question Answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:10267-10276.
[47]LU J,BATRA D,PARIKH D,et al.Vilbert:Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks[C]//Advances in Neural Information Processing Systems.2019:13-23.
[48]SU W,ZHU X,CAO Y,et al.V-bert:Pre-training of genericvisual-linguistic representations[J].arXiv:1908.08530,2019.
[49]TAN H,BANSAL M.LXMERT:Learning Cross-Modality Encoder Representations from Transformers[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).2019:5103-5114.
[50]CHEN Y C,LI L,YU L,et al.Uniter:Learning universal image-text representations[J].arXiv:1909.11740,2019.
[51]LI X,YIN X,LI C,et al.Oscar:Object-semantics aligned pre-training for vision-language tasks[C]//European Conference on Computer Vision.Springer,Cham,2020:121-137.
[52]LU J,GOSWAMI V,ROHRBACH M,et al.12-in-1:Multi-task vision and language representation learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:10437-10446.
[53]LI L H,YATSKAR M,YIN D,et al.Visualbert:A simple and performant baseline for vision and language[J].arXiv:1908.03557,2019.
[54]SHARMA P,DING N,GOODMAN S,et al.Conceptual cap-tions:A cleaned,hypernymed,image alt-text dataset for automatic image captioning[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.2018:2556-2565.
[55]ORDONEZ V,KULKARNI G,BERG T L.Im2Text:describing images using 1 million captioned photographs[C]//Proceedings of the 24th International Conference on Neural Information Processing Systems.2011:1143-1151.
[56]JING C,WU Y,ZHANG X,et al.Overcoming Language Priors in VQA via Decomposed Linguistic Representations[C]//AAAI.2020:11181-11188.
[57]KV G,MITTAL A.Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder[J].arXiv:2007.06198,2020.
[58]SELVARAJU R R,LEE S,SHEN Y,et al.Taking a hint:Leveraging explanations to make vision and language models more grounded[C]//Proceedings of the IEEE International Confe-rence on Computer Vision.2019:2591-2600.
[59]WU J,MOONEY R.Self-critical reasoning for robust visualquestion answering[C]//Advances in Neural Information Processing Systems.2019:8604-8614.
[60]DAS A,AGRAWAL H,ZITNICK L,et al.Human Attention in Visual Question Answering:Do Humans and Deep Networks look at the same regions?[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.2016:932-937.
[61]HUK PARK D,ANNE HENDRICKS L,AKATA Z,et al.Multimodal explanations:Justifying decisions and pointing to the evi-dence[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8779-8788.
[62]RAMAKRISHNAN S,AGRAWAL A,LEE S.Overcoming language priors in visual question answering with adversarial regularization[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.2018:1548-1558.
[63]CADENE R,DANCETTE C,CORD M,et al.Rubi:Reducingunimodal biases for visual question answering[C]//Advances in neural information processing systems.2019:841-852.
[64]CLARK C,YATSKAR M,ZETTLEMOYER L.Don’t Takethe Easy Way Out:Ensemble Based Methods for Avoiding Known Dataset Biases[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).2019:4060-4073.
[65]ABBASNEJAD E,TENEY D,PARVANEH A,et al.Counterfactual vision and language learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:10044-10054.
[66]TENEY D,ABBASNEJAD E,HENGEL A.Unshuffling datafor improved generalization[J].arXiv:2002.11894,2020.
[67]TENEY D,KAFLE K,SHRESTHA R,et al.On the Value of Out-of-Distribution Testing:An Example of Goodhart’s Law[J].arXiv:2005.09241,2020.
[68]ZHU X,MAO Z,LIU C,et al.Overcoming Language Priorswith Self-supervised Learning for Visual Question Answering[J].arXiv:2012.11528,2020.
[69]CHEN L,YAN X,XIAO J,et al.Counterfactual samples synthesizing for robust visual question answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:10800-10809.
[70]LIANG Z,JIANG W,HU H,et al.Learning to Contrast the Counterfactual Samples for Robust Visual Question Answering[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).2020:3285-3292.
[71]GOKHALE T,BANERJEE P,BARAL C,et al.MUTANT:A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).2020:878-892.
[72]LU J,KANNAN A,YANG J,et al.Best of both worlds:transferring knowledge from discriminative learning to a generative visual dialog model[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:313-323.
[73]WU Q,WANG P,SHEN C,et al.Are you talking to me? reasoned visual dialog generation through adversarial learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6106-6115.
[74]GUO D,XU C,TAO D.Image-question-answer synergistic network for visual dialog[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:10434-10443.
[75]SEO P H,LEHRMANN A,HAN B,et al.Visual reference re-solution using attention memory for visual dialog[C]//Advances in Neural Information Processing Systems.2017:3719-3729.
[76]KOTTUR S,MOURA J M F,PARIKH D,et al.Visual corefe-rence resolution in visual dialog using neural module networks[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:153-169.
[77]NIU Y,ZHANG H,ZHANG M,et al.Recursive visual attention in visual dialog[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6679-6688.
[78]KANG G C,LIM J,ZHANG B T.Dual Attention Networks for Visual Reference Resolution in Visual Dialog[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).2019:2024-2033.
[79]GAN Z,CHENG Y,KHOLY A,et al.Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:6463-6474.
[80]SCHWARTZ I,YU S,HAZAN T,et al.Factor graph attention[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:2039-2048.
[81]ZHENG Z,WANG W,QI S,et al.Reasoning visual dialogs with structural and partial observations[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6669-6678.
[82]JIANG X,YU J,QIN Z,et al.DualVD:An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue[C]//AAAI.2020,1(3):5.
[83]MURAHARI V,BATRA D,PARIKH D,et al.Large-scale pretraining for visual dialog:A simple state-of-the-art baseline[J].arXiv:1912.02379,2019.
[84]WANG Y,JOTY S,LYU M R,et al.Vd-bert:A unified vision and dialog transformer with bert[J].arXiv:2004.13278,2020.
[85]NIU Y,TANG K,ZHANG H,et al.Counterfactual VQA:A Cause-Effect Look at Language Bias[J].arXiv:2006.04315,2020.
[86]TANG K,NIU Y,HUANG J,et al.Unbiased scene graph generation from biased training[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:3716-3725.
[87]YANG X,ZHANG H,CAI J.Deconfounded image captioning:A causal retrospect[J].arXiv:2003.03923,2020.
[88]WANG T,HUANG J,ZHANG H,et al.Visual commonsense r-cnn[C]//Proceedings of the IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition.2020:10760-10770.
[89]TANG K,HUANG J,ZHANG H.Long-tailed classification by keeping the good and removing the bad momentum causal effect[J].arXiv:2009.12991,2020.
[90]YUE Z,ZHANG H,SUN Q,et al.Interventional few-shotlearning[J].arXiv:2009.13000,2020.
[91]ZHANG D,ZHANG H,TANG J,et al.Causal intervention for weakly-supervised semantic segmentation[J].arXiv:2009.12547,2020.
[92]MASSICETI D,DOKANIA P K,SIDDHARTH N,et al.Visual dialogue without vision or dialogue[J].arXiv:1812.06417,2018.
[93]AGARWAL S,BUI T,LEE J Y,et al.History for Visual Dialog:Do we really need it?[J].arXiv:2005.07493,2020.
[94]MASSICETI D,KULHARIA V,DOKANIA P K,et al.A Revised Generative Evaluation of Visual Dialogue[J].arXiv:2004.09272,2020.
[95]SINGH A,NATARAJAN V,SHAH M,et al.Towards vqamodels that can read[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:8317-8326.
[96]BITEN A F,TITO R,MAFLA A,et al.Scene text visual question answering[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:4291-4301.
[97]GURARI D,LI Q,STANGL A J,et al.Vizwiz grand challenge:Answering visual questions from blind people[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:3608-3617.
[98]HUDSON D A,MANNING C D.Gqa:A new dataset for real-world visual reasoning and compositional question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6700-6709.
[99]ZELLERS R,BISK Y,FARHADI A,et al.From recognition to cognition:Visual commonsense reasoning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6720-6731.
[1] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[2] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[3] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[4] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[5] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[6] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[7] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[8] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[9] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[10] 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫.
小样本雷达辐射源识别的深度学习方法综述
Survey of Deep Learning for Radar Emitter Identification Based on Small Sample
计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[11] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[12] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[13] 王君锋, 刘凡, 杨赛, 吕坦悦, 陈峙宇, 许峰.
基于多源迁移学习的大坝裂缝检测
Dam Crack Detection Based on Multi-source Transfer Learning
计算机科学, 2022, 49(6A): 319-324. https://doi.org/10.11896/jsjkx.210500124
[14] 楚玉春, 龚航, 王学芳, 刘培顺.
基于YOLOv4的目标检测知识蒸馏算法研究
Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4
计算机科学, 2022, 49(6A): 337-344. https://doi.org/10.11896/jsjkx.210600204
[15] 周志豪, 陈磊, 伍翔, 丘东亮, 梁广升, 曾凡巧.
基于SMOTE-SDSAE-SVM的车载CAN总线入侵检测算法
SMOTE-SDSAE-SVM Based Vehicle CAN Bus Intrusion Detection Algorithm
计算机科学, 2022, 49(6A): 562-570. https://doi.org/10.11896/jsjkx.210700106
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!