计算机科学 ›› 2026, Vol. 53 ›› Issue (1): 153-162.doi: 10.11896/jsjkx.250300021

• 计算机图形学&多媒体 • 上一篇    下一篇

EvR-DETR:融合事件与RGB图像的轻量级端到端目标检测

周秉泉, 蒋杰, 陈江民, 詹礼新   

  1. 国防科技大学系统工程学院 长沙 410073
  • 收稿日期:2025-03-04 修回日期:2025-05-08 发布日期:2026-01-08
  • 通讯作者: 蒋杰(jiejiang@nudt.edu.cn)
  • 作者简介:(bingquanzhou@nudt.edn.cn)

EvR-DETR:Event-RGB Fusion for Lightweight End-to-End Object Detection

ZHOU Bingquan, JIANG Jie, CHEN Jiangmin, ZHAN Lixin   

  1. College of System Engineering, National University of Defense Technology, Changsha 410073, China
  • Received:2025-03-04 Revised:2025-05-08 Online:2026-01-08
  • About author:ZHOU Bingquan,born in 2000,postgraduate, professor.His main research interests include computer vision and event-based vision.
    JIANG Jie,born in 1974,Ph.D, professor.His main research interests include artificial intelligence and deep learning,visualization and visual analytics,virtual reality and intelligent interaction.

摘要: 基于神经脉冲信号的事件摄像机可以提供光线变化的信息,以弥补传统RGB相机目标检测在恶劣环境性能下降的缺陷。然而,传统融合事件相机的现有方法存在模型参数大和非端到端训练方法的问题,限制了模态融合的有效性。因此,提出了一种事件与RGB信息融合的轻量级端到端对象检测框架,基于两种模态各级尺度特征进行不同细粒度的信息融合,同时基于重参数化卷积实现轻量级的融合模块并进行端到端的训练,从而提升模型对于两种模态互补信息的提取能力,以克服自动驾驶中具有挑战性的不利环境。所提出的模型在大规模数据集PKU-SOD上进行了测试,该数据集提供了低光、高速运动模糊与正常光照环境下车辆行驶的视觉数据。实验结果表明,与此前的多模态目标检测框架相比,所提方法在模型参数量上大幅下降,并提升了目标检测的准确率与推理速度,表现出优于现有方法的性能。

关键词: 目标检测, 仿生相机, 自动驾驶, 深度学习, 端到端目标检测, 事件相目标机检测, 轻量化目标检测

Abstract: Event cameras based on neuromorphic spike signals can provide information about illumination changes,compensating for the performance degradation of traditional RGB cameras in object detection under adverse environments.However,existing methods fusing event cameras with conventional cameras suffer from large model parameters and non-end-to-end training approaches,which restrict the effectiveness of modality fusion.To address this,this paper proposes a lightweight end-to-end object detection framework that integrates event and RGB information through multi-granularity fusion of multi-scale features across different network levels.By implementing lightweight fusion modules with reparameterized convolutions and enabling end-to-end training,the proposed framework enhances the model’s capability to extract complementary information from both modalities,overcoming challenging conditions in autonomous driving.Evaluated on the large-scale PKU-SOD dataset containing vehicular visual data under low-light,high-speed motion blur,and normal illumination scenarios,the proposed method significantly reduces model parameters compared to state-of-the-art multimodal approaches while improving detection accuracy and inference speed,demonstrating superior performance over existing methods.

Key words: Object detection, Neuromorphic camera, Autonomous driving, Deep learning, End-to-end object detection, Event-basedobject detection, Light-weight object detection

中图分类号: 

  • TP391
[1]LIU L,OUYANG W,WANG X,et al.Deep Learning for Gene-ric Object Detection:A Survey[J].International Journal of Computer Vision,2020,128(2):261-318.
[2]GALLEGO G,DELBRÜCK T,ORCHARD G,et al.Event-Based Vision:A Survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,44(1):154-180.
[3]CHEN G,CAO H,CONRADT J,et al.Event-Based Neuromorphic Vision for Autonomous Driving:A Paradigm Shift for Bio-Inspired Visual Sensing and Perception[J].IEEE Signal Processing Magazine,2020,37(4):34-49.
[4]BERNER R,BRANDLI C,YANG M,et al.A latency sparseoutput vision sensor for mobile applications[C]//2013 Symposium on VLSI Circuits.2013:C186-C187.
[5] WANG L,LIU Z,SHI D X,et al.Fusion Tracker:Single-object Tracking Framework Fusing Image Features and Event Features[J].Computer Science,2023,50(10):96-103.
[6]JIANG J,ZHOU B,ZHOU T,et al.Deep Event-Based Object Detection in Autonomous Driving:A Survey[C]//2024 10th International Conference on Big Data and Information Analytics(BigDIA).2024:447-454.
[7]JIANG Z,XIA P,HUANG K,et al.Mixed Frame-/Event-Dri-ven Fast Pedestrian Detection[C]//2019 International Confe-rence on Robotics and Automation(ICRA).2019:8332-8338.
[8]LI J,DONG S,YU Z,et al.Event-Based Vision Enhanced:AJoint Detection Framework in Autonomous Driving[C]//2019 IEEE International Conference on Multimedia and Expo(ICME).2019:1396-1401.
[9]TOMY A,PAIGWAR A,MANN K S,et al.Fusing Event-based and RGB camera for Robust Object Detection in Adverse Conditions[C]//2022 International Conference on Robotics and Automation(ICRA).2022:933-939.
[10]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.An Image is Worth 16x16 Words:Transformers for Image Recognition at Scale[C]//International Conference on Learning Representations.2021.
[11]LIU M,QI N,SHI Y,et al.An Attention Fusion Network For Event-Based Vehicle Object Detection[C]//IEEE International Conference on Image Processing.IEEE,2021.
[12]ZHOU Z,WU Z,BOUTTEAU R,et al.RGB-Event Fusion for Moving Object Detection in Autonomous Driving[C]// 2023 IEEE International Conference on Robotics and Automation(ICRA).2023:7808-7815.
[13]LI D,TIAN Y,LI J.SODFormer:Streaming Object Detection With Transformer Using Events and Frames[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(11):14020-14037.
[14]ZHAO Y,LYU W,XU S,et al.DETRs Beat YOLOs on Real-time Object Detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2024:16965-16974.
[15] CARION N,MASSA F,SYNNAEVE G,et al.End-to-End Object Detection with Transformers[C]//Computer Vision-ECCV 2020.Cham:Springer,2020:213-229.
[16]BERNER R,BRANDLI C,YANG M,et al.A 240×180 10mW 12us latency sparseoutput vision sensor for mobile applications[C]//2013 Symposium on VLSI Circuits.2013:C186-C187.
[17]HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2016:770-778.
[18]LI F,ZHANG H,LIU S,et al.DN-DETR:Accelerate DETRTraining by Introducing Query DeNoising[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2024,46(4):2239-2251.
[19]DING X,ZHANG X,MA N,et al.RepVGG:Making VGG-style ConvNets Great Again[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2021:13728-13737.
[20]ZHU X,SU W,LU L,et al.Deformable DETR:DeformableTransformers for End-to-End Object Detection[J].arXiv:2010.04159,2020.
[21]IACONO M,WEBER S,GLOVER A,et al.Towards Event-Driven Object Detection with Off-the-Shelf Deep Learning[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).2018:1-9.
[22]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[23]LI H,WU X J,KITTLER J.MDLatLRR:A Novel Decomposi-tion Method for Infrared and Visible Image Fusion[J].IEEE Transactions on Image Processing,2020,29:4733-4746.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!