计算机科学 ›› 2021, Vol. 48 ›› Issue (3): 87-96.doi: 10.11896/jsjkx.201200174
所属专题: 多媒体技术进展
牛玉磊, 张含望
NIU Yu-lei, ZHANG Han-wang
摘要: 视觉问答与对话是人工智能领域的重要研究任务,是计算机视觉与自然语言处理交叉领域的代表性问题之一。视觉问答与对话任务要求机器根据指定的视觉图像内容,对单轮或多轮的自然语言问题进行作答。视觉问答与对话对机器的感知能力、认知能力和推理能力均提出了较高的要求,在跨模态人机交互应用中具有实用前景。文中对近年来视觉问答与对话的研究进展进行了综述,对数据集和算法进行了归纳,对研究挑战和问题进行了总结,最后对视觉问答与对话的未来发展趋势进行了讨论。
中图分类号:
[1]YU J,WANG L,YU Z.Research on Visual Question Answering Techniques[J].Journal of Computer Research and Development,2018,55(9):1946-1958. [2]QI J,NIU Y,HUANG J,et al.Two causal principles for improving visual dialog[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2020:10860-10869. [3]ANTOL S,AGRAWAL A,LU J,et al.Vqa:Visual question answering[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:2425-2433. [4]GOYAL Y,KHOT T,SUMMERS-STAY D,et al.Making the V in VQA matter:Elevating the role of image understanding in Visual Question Answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6904-6913. [5]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European Conference on Computer Vision.Springer,Cham,2014:740-755. [6]DAS A,KOTTUR S,GUPTA K,et al.Visual dialog[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:326-335. [7]MALINOWSKI M,FRITZ M.A Multi-world Approach toQuestion Answering about Real-world Scenes based on Uncertain Input[C]//Twenty-Eighth Annual Conference on Neural Information Processing Systems.Curran,2014:1682-1690. [8]REN M,KIROS R,ZEMEL R S.Exploring models and data for image question answering[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems.2015:2953-2961. [9]GAO H,MAO J,ZHOU J,et al.Are you talking to a machine? Dataset and methods for multilingual image question answering[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems.2015:2296-2304. [10]KRISHNA R,ZHU Y,GROTH O,et al.Visual genome:Connecting language and vision using crowdsourced dense image annotations[J].International Journal of Computer Vision,2017,123(1):32-73. [11]ZHU Y,GROTH O,BERNSTEIN M,et al.Visual7w:Groun-ded question answering in images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4995-5004. [12]YU L,PARK E,BERG A C,et al.Visual madlibs:Fill in the blank image generation and question answering[J].arXiv:1506.00278,2015. [13]DE VRIES H,STRUB F,CHANDAR S,et al.Guesswhat?!visual object discovery through multi-modal dialogue[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5503-5512. [14]XU H,SAENKO K.Ask,attend and answer:Exploring question-guided spatial attention for visual question answering[C]//European Conference on Computer Vision.Springer,Cham,2016:451-466. [15]YANG Z,HE X,GAO J,et al.Stacked attention networks for image question answering[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2016:21-29. [16]LU J,YANG J,BATRA D,et al.Hierarchical question-imageco-attention for visual question answering[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.2016:289-297. [17]YU Z,YU J,CUI Y,et al.Deep modular co-attention networks for visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6281-6290. [18]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:6000-6010. [19]KIM J H,JUN J,ZHANG B T.Bilinear attention networks[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.2018:1571-1581. [20]FUKUI A,PARK D H,YANG D,et al.Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.2016:457-468. [21]KIM J H,ON K W,LIM W,et al.Hadamard product for low-rank bilinear pooling[J].arXiv:1610.04325,2016. [22]YU Z,YU J,FAN J,et al.Multi-modal factorized bilinear pooling with co-attention learning for visual question answering[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:1821-1830. [23]BEN-YOUNES H,CADENE R,CORD M,et al.Mutan:Multimodal tucker fusion for visual question answering[C]//Procee-dings of the IEEE International Conference on Computer Vision.2017:2612-2620. [24]BEN-YOUNES H,CADENE R,THOME N,et al.Block:Bili-near superdiagonal fusion for vi-sual question answering and vi-sual relationship detection[C]//Proceedings of the AAAI Confe-rence on Artificial Intelligence.2019,33:8102-8109. [25]ANDREAS J,ROHRBACH M,DARRELL T,et al.Neural module networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:39-48. [26]HU R,ANDREAS J,ROHRBACH M,et al.Learning to rea-son:End-to-end module networks for visual question answering[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:804-813. [27]HU R,ANDREAS J,DARRELL T,et al.Explainable neuralcomputation via stack neural module networks[C]//Procee-dings of the European Conference on Computer Vision (ECCV).2018:53-69. [28]JOHNSON J,HARIHARAN B,VAN DER MAATEN L,et al.Inferring and executing programs for visual reasoning[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2989-2998. [29]MASCHARKA D,TRAN P,SOKLASKI R,et al.Transparency by design:Closing the gap between performance and interpretability in visual reasoning[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2018:4942-4950. [30]YI K,WU J,GAN C,et al.Neural-symbolic VQA:disentangling reasoning from vision and language understanding[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.2018:1039-1050. [31]SHI J,ZHANG H,LI J.Explainable and explicit visual reaso-ning over scene graphs[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2019:8376-8384. [32]VEDANTAM R,DESAI K,LEE S,et al.Probabilistic Neural Symbolic Models for Interpretable Visual Question Answering[C]//International Conference on Machine Learning.2019:6428-6437. [33]CHEN W,GAN Z,LI L,et al.Meta module network for compositional visual reasoning[J].arXiv:1910.03230,2019. [34]JOHNSON J,HARIHARAN B,VAN DER MAATEN L,et al.Clevr:A diagnostic dataset for compositional language and elementary visual reasoning[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:2901-2910. [35]CADENE R,BEN-YOUNES H,CORD M,et al.Murel:Multimodal relational reasoning for visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:1989-1998. [36]LI L,GAN Z,CHENG Y,et al.Relation-aware graph attention network for visual question answering[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:10313-10322. [37]HU R,ROHRBACH A,DARRELL T,et al.Language-conditioned graph networks for relational reasoning[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:10294-10303. [38]KAFLE K,YOUSEFHUSSIEN M,KANAN C.Data augmentation for visual question answering[C]//Proceedings of the 10th International Conference on Natural Language Generation.2017:198-202. [39]RAY A,SIKKA K,DIVAKARAN A,et al.Sunny and DarkOutside?! Improving Answer Consistency in VQA through Entailed Question Generation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).2019:5863-5868. [40]SHAH M,CHEN X,ROHRBACH M,et al.Cycle-consistencyfor robust visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6649-6658. [41]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014. [42]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [43]DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:248-255. [44]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).2014:1532-1543. [45]ANDERSON P,HE X,BUEHLER C,et al.Bottom-up and top-down attention for image captioning and visual question answe-ring[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6077-6086. [46]JIANG H,MISRA I,ROHRBACH M,et al.In Defense of Grid Features for Visual Question Answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:10267-10276. [47]LU J,BATRA D,PARIKH D,et al.Vilbert:Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks[C]//Advances in Neural Information Processing Systems.2019:13-23. [48]SU W,ZHU X,CAO Y,et al.V-bert:Pre-training of genericvisual-linguistic representations[J].arXiv:1908.08530,2019. [49]TAN H,BANSAL M.LXMERT:Learning Cross-Modality Encoder Representations from Transformers[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).2019:5103-5114. [50]CHEN Y C,LI L,YU L,et al.Uniter:Learning universal image-text representations[J].arXiv:1909.11740,2019. [51]LI X,YIN X,LI C,et al.Oscar:Object-semantics aligned pre-training for vision-language tasks[C]//European Conference on Computer Vision.Springer,Cham,2020:121-137. [52]LU J,GOSWAMI V,ROHRBACH M,et al.12-in-1:Multi-task vision and language representation learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:10437-10446. [53]LI L H,YATSKAR M,YIN D,et al.Visualbert:A simple and performant baseline for vision and language[J].arXiv:1908.03557,2019. [54]SHARMA P,DING N,GOODMAN S,et al.Conceptual cap-tions:A cleaned,hypernymed,image alt-text dataset for automatic image captioning[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.2018:2556-2565. [55]ORDONEZ V,KULKARNI G,BERG T L.Im2Text:describing images using 1 million captioned photographs[C]//Proceedings of the 24th International Conference on Neural Information Processing Systems.2011:1143-1151. [56]JING C,WU Y,ZHANG X,et al.Overcoming Language Priors in VQA via Decomposed Linguistic Representations[C]//AAAI.2020:11181-11188. [57]KV G,MITTAL A.Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder[J].arXiv:2007.06198,2020. [58]SELVARAJU R R,LEE S,SHEN Y,et al.Taking a hint:Leveraging explanations to make vision and language models more grounded[C]//Proceedings of the IEEE International Confe-rence on Computer Vision.2019:2591-2600. [59]WU J,MOONEY R.Self-critical reasoning for robust visualquestion answering[C]//Advances in Neural Information Processing Systems.2019:8604-8614. [60]DAS A,AGRAWAL H,ZITNICK L,et al.Human Attention in Visual Question Answering:Do Humans and Deep Networks look at the same regions?[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.2016:932-937. [61]HUK PARK D,ANNE HENDRICKS L,AKATA Z,et al.Multimodal explanations:Justifying decisions and pointing to the evi-dence[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8779-8788. [62]RAMAKRISHNAN S,AGRAWAL A,LEE S.Overcoming language priors in visual question answering with adversarial regularization[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.2018:1548-1558. [63]CADENE R,DANCETTE C,CORD M,et al.Rubi:Reducingunimodal biases for visual question answering[C]//Advances in neural information processing systems.2019:841-852. [64]CLARK C,YATSKAR M,ZETTLEMOYER L.Don’t Takethe Easy Way Out:Ensemble Based Methods for Avoiding Known Dataset Biases[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).2019:4060-4073. [65]ABBASNEJAD E,TENEY D,PARVANEH A,et al.Counterfactual vision and language learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:10044-10054. [66]TENEY D,ABBASNEJAD E,HENGEL A.Unshuffling datafor improved generalization[J].arXiv:2002.11894,2020. [67]TENEY D,KAFLE K,SHRESTHA R,et al.On the Value of Out-of-Distribution Testing:An Example of Goodhart’s Law[J].arXiv:2005.09241,2020. [68]ZHU X,MAO Z,LIU C,et al.Overcoming Language Priorswith Self-supervised Learning for Visual Question Answering[J].arXiv:2012.11528,2020. [69]CHEN L,YAN X,XIAO J,et al.Counterfactual samples synthesizing for robust visual question answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:10800-10809. [70]LIANG Z,JIANG W,HU H,et al.Learning to Contrast the Counterfactual Samples for Robust Visual Question Answering[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).2020:3285-3292. [71]GOKHALE T,BANERJEE P,BARAL C,et al.MUTANT:A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).2020:878-892. [72]LU J,KANNAN A,YANG J,et al.Best of both worlds:transferring knowledge from discriminative learning to a generative visual dialog model[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:313-323. [73]WU Q,WANG P,SHEN C,et al.Are you talking to me? reasoned visual dialog generation through adversarial learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6106-6115. [74]GUO D,XU C,TAO D.Image-question-answer synergistic network for visual dialog[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:10434-10443. [75]SEO P H,LEHRMANN A,HAN B,et al.Visual reference re-solution using attention memory for visual dialog[C]//Advances in Neural Information Processing Systems.2017:3719-3729. [76]KOTTUR S,MOURA J M F,PARIKH D,et al.Visual corefe-rence resolution in visual dialog using neural module networks[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:153-169. [77]NIU Y,ZHANG H,ZHANG M,et al.Recursive visual attention in visual dialog[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6679-6688. [78]KANG G C,LIM J,ZHANG B T.Dual Attention Networks for Visual Reference Resolution in Visual Dialog[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).2019:2024-2033. [79]GAN Z,CHENG Y,KHOLY A,et al.Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:6463-6474. [80]SCHWARTZ I,YU S,HAZAN T,et al.Factor graph attention[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:2039-2048. [81]ZHENG Z,WANG W,QI S,et al.Reasoning visual dialogs with structural and partial observations[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6669-6678. [82]JIANG X,YU J,QIN Z,et al.DualVD:An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue[C]//AAAI.2020,1(3):5. [83]MURAHARI V,BATRA D,PARIKH D,et al.Large-scale pretraining for visual dialog:A simple state-of-the-art baseline[J].arXiv:1912.02379,2019. [84]WANG Y,JOTY S,LYU M R,et al.Vd-bert:A unified vision and dialog transformer with bert[J].arXiv:2004.13278,2020. [85]NIU Y,TANG K,ZHANG H,et al.Counterfactual VQA:A Cause-Effect Look at Language Bias[J].arXiv:2006.04315,2020. [86]TANG K,NIU Y,HUANG J,et al.Unbiased scene graph generation from biased training[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:3716-3725. [87]YANG X,ZHANG H,CAI J.Deconfounded image captioning:A causal retrospect[J].arXiv:2003.03923,2020. [88]WANG T,HUANG J,ZHANG H,et al.Visual commonsense r-cnn[C]//Proceedings of the IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition.2020:10760-10770. [89]TANG K,HUANG J,ZHANG H.Long-tailed classification by keeping the good and removing the bad momentum causal effect[J].arXiv:2009.12991,2020. [90]YUE Z,ZHANG H,SUN Q,et al.Interventional few-shotlearning[J].arXiv:2009.13000,2020. [91]ZHANG D,ZHANG H,TANG J,et al.Causal intervention for weakly-supervised semantic segmentation[J].arXiv:2009.12547,2020. [92]MASSICETI D,DOKANIA P K,SIDDHARTH N,et al.Visual dialogue without vision or dialogue[J].arXiv:1812.06417,2018. [93]AGARWAL S,BUI T,LEE J Y,et al.History for Visual Dialog:Do we really need it?[J].arXiv:2005.07493,2020. [94]MASSICETI D,KULHARIA V,DOKANIA P K,et al.A Revised Generative Evaluation of Visual Dialogue[J].arXiv:2004.09272,2020. [95]SINGH A,NATARAJAN V,SHAH M,et al.Towards vqamodels that can read[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:8317-8326. [96]BITEN A F,TITO R,MAFLA A,et al.Scene text visual question answering[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:4291-4301. [97]GURARI D,LI Q,STANGL A J,et al.Vizwiz grand challenge:Answering visual questions from blind people[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:3608-3617. [98]HUDSON D A,MANNING C D.Gqa:A new dataset for real-world visual reasoning and compositional question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6700-6709. [99]ZELLERS R,BISK Y,FARHADI A,et al.From recognition to cognition:Visual commonsense reasoning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6720-6731. |
[1] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[2] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[3] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[4] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[5] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[6] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[7] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[8] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[9] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[10] | 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138 |
[11] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[12] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[13] | 王君锋, 刘凡, 杨赛, 吕坦悦, 陈峙宇, 许峰. 基于多源迁移学习的大坝裂缝检测 Dam Crack Detection Based on Multi-source Transfer Learning 计算机科学, 2022, 49(6A): 319-324. https://doi.org/10.11896/jsjkx.210500124 |
[14] | 楚玉春, 龚航, 王学芳, 刘培顺. 基于YOLOv4的目标检测知识蒸馏算法研究 Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4 计算机科学, 2022, 49(6A): 337-344. https://doi.org/10.11896/jsjkx.210600204 |
[15] | 周志豪, 陈磊, 伍翔, 丘东亮, 梁广升, 曾凡巧. 基于SMOTE-SDSAE-SVM的车载CAN总线入侵检测算法 SMOTE-SDSAE-SVM Based Vehicle CAN Bus Intrusion Detection Algorithm 计算机科学, 2022, 49(6A): 562-570. https://doi.org/10.11896/jsjkx.210700106 |
|