跨模态噪声过滤的事件相机目标检测算法

doi:10.11896/jsjkx.231000013

计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 231000013-6.doi: 10.11896/jsjkx.231000013

• 图像处理&多媒体技术 • 上一篇下一篇

跨模态噪声过滤的事件相机目标检测算法

胡刚, 梁栋, 黄圣君

南京航空航天大学计算机科学与技术学院南京 211106

出版日期:2024-11-16 发布日期:2024-11-13
通讯作者: 黄圣君(huangsj@nuaa.edu.cn)
作者简介:(hugang@nuaa.edu.cn)

Event-based Camera Object Detection Algorithm for Cross-modal Noisy Annotations Filtering

HU Gang, LIANG Dong, HUANG Shengjun

School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China

Online:2024-11-16 Published:2024-11-13
About author:HU Gang,born in 1998,postgraduate.His main research interests include computer vision and machine learning.
HUANG Shengjun,born in 1986,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.42916S).His main research interests include machine lear-ning and data mining.

摘要/Abstract

摘要： 事件相机具有高时间分辨率、高动态范围和低功耗等特性,通常被用于传统相机应用受限场景(高速度、强光、弱光等)下的目标检测任务中。然而由于事件相机的像素异步性,其输出的事件序列难以进行人工标注,为此现有方法通过RGB图像标记迁移得到事件序列标记。然而,迁移标记中存在大量噪声标记和事件序列中部分目标纹理模糊,导致难以取得理想的模型性能。为了解决此问题,提出了一种跨模态噪声过滤的事件相机目标检测算法。算法利用预训练后的事件相机检测器对开源RGB目标检测数据集进行筛选,得到对训练事件相机检测器最具价值的RGB图像和事件图像一起构成跨模态混合图像,帮助检测器更准确地识别、定位事件图像目标;为了缓解噪声标记对检测器性能的影响,设计了一种多阶段目标检测联合优化策略,单个阶段训练完成时,在全局标记中识别噪声标记,并对噪声标记进行修正后在下一阶段使用。实验结果表明,在1Mpx Detection Dataset上,与基准模型相比,跨模态噪声过滤的事件相机目标检测算法提供了8.35%的模型增益,远优于Co-teaching,O2U-net等噪声标签学习方法,具体地,跨模态混合图像训练、联合优化框架分别提供了6.44%,4.77%的模型增益。

关键词: 事件相机, 目标检测, 噪声标记, 跨模态, 联合优化

Abstract: Event-based camera is commonly seen in object detection in limited scenarios for traditional camera applications (high speed,strong light,low light,etc.) due to their high time resolution,high dynamic range and low power consumption.However,the event sequence output of event camera is difficult to be manually labeled due to its pixel asynchronism,so the existing me-thods obtain event sequence annotations through the migration of RGB image annotations.However,since the migrated annotations have numerous inaccurate bounding boxes and some object textures in event sequence are fuzzy,leading to poor model performance.To address this problem,event-based camera object detection algorithm for cross-modal noisy annotations filtering is proposed.The method uses a pre-trained event-based camera detector to filter open-source RGB object detection datasets and selects RGB images that are most valuable for training the event-based camera detector.These selected RGB images are combined with event images to construct cross-domain mixed images,helping the detector to identify and locate the event image object more accurately.To mitigate the impact of noisy annotations on detector performance,a multi-stage object detection joint optimization strategy is designed.After each stage of training is completed,noisy annotations are identified in the global annotations and are corrected use in the next stage.Experimental results show that,on the 1Mpx Detection Dataset,the robust event-based camera cross-modal object detection method based on noisy annotations provides 8.35% model gain compared to the baseline model,significantly outperforming noise-label learning methods such as Co-teaching and O2U-net.Specifically,cross-modal hybrid images training and joint optimization frameworks offer model gains of 6.44% and 4.77%,respectively.

Key words: Event-based camera, Object detection, Noisy annotations, Cross-modal, Joint optimization

中图分类号:

TP391.4

胡刚, 梁栋, 黄圣君. 跨模态噪声过滤的事件相机目标检测算法[J]. 计算机科学, 2024, 51(11A): 231000013-6. https://doi.org/10.11896/jsjkx.231000013

HU Gang, LIANG Dong, HUANG Shengjun. Event-based Camera Object Detection Algorithm for Cross-modal Noisy Annotations Filtering[J]. Computer Science, 2024, 51(11A): 231000013-6. https://doi.org/10.11896/jsjkx.231000013

参考文献

[1]WANG L,LIU Z,SHI D X,et al.Fusion Tracker:Single-object Tracking Framework Fusing Image Features and Event Features[J].Computer Science,2023,50(10):96-103.
[2]HAN J,YANG Y,ZHOU C,et al.EvIntSR-Net:Event guided multiple latent frames reconstruction and super-resolution[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:4882-4891.
[3]XU Q,DENG J,SHEN J R,et al.A Review of Image Reconstruction Based on Event Cameras[J].Journal of Electronics & Information Technology, 2023,45(8):2699-2709.
[4]LICHTSTEINERP,POSCH C,DelBruck T.A 128× 128 120 dB 15 μs Latency Asynchronous Temporal Contrast Vision Sensor[J].IEEE Journal of Solid-State Circuits,2008,43(2):566-576.
[5]GALLEGO G,DELBRÜCK T,ORCHARD G,et al.Event-based vision:A survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,44(1):154-180.
[6]SABATER A,MONTESANO L,MURILLO A C.Event Transformer.A sparse-aware solution for efficient event data proces-sing[C]//Proceedings of the IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition.2022:2677-2686.
[7]WAN J,XIA M,HUANG Z,et al.Event-Based Pedestrian Detection Using Dynamic Vision Sensors[J].Electronics,2021,10(8):888.
[8]MIAO S,CHEN G,NING X,et al.Neuromorphic vision datasets for pedestrian detection,action recognition,and fall detection[J].Frontiers in Neurorobotics,2019,13:38.
[9]HE D C,WANG L.Texture unit,texture spectrum,and texture analysis[J].IEEE transactions on Geoscience and Remote Sen-sing,1990,28(4):509-512.
[10]PEROT E,DE TOURNEMIRE P,NITTI D,et al.Learning to detect objects with a 1 megapixel event camera[J].Advances in Neural Information Processing Systems,2020,33:16639-16652.
[11]FINATEU T,NIWA A,MATOLIN D,et al.5.10 a 1280× 720 back-illuminated stacked temporal contrast event-based vision sensor with 4.86-m pixels,1.066 GEPS readout,programmable event-rate controller and compressive data-formatting pipeline[C]//2020 IEEE International Solid-State Circuits Conference(ISSCC).IEEE,2020.
[12]HUANG J,QU L,JIA R,et al.O2u-net:A simple noisy label detection approach for deep neural networks[C]//Proceedings of the IEEE/CVF Cnternational Conference on Computer Vision.2019:3326-3334.
[13]HAN B,YAO Q,YU X,et al.Co-teaching:Robust training of deep neural networks with extremely noisy labels[J].arXiv:1804.06872,2018.
[14]TANAKA D,IKAMI D,YAMASAKIT,et al.Joint optimization framework for learning with noisy labels[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5552-5560.
[15]LI J,XIONG C,SOCHER R,et al.Towards noise-resistant object detection with noisy annotations[J].arXiv:2003.01285,2020.
[16]BOCHKOVSKIY A,WANG C Y,LIAO H Y M.Yolov4:Optimal speed and accuracy of object detection[J].arXiv:2004.10934,2020.
[17]LIU K,QIAN X,WANG Z Q.Survey on active learning algorithms[J] Computer Engineering and Applications,2012,48(34):1-4.
[18]XIE Y,TOMIZUKA M,ZHAN W.Towards general and efficient active learning[J].arXiv:2112.07963,2021.
[19]PAN S J,YANG Q.A survey on transfer learning[J].IEEE Transactions on Knowledge and Data Engineering,2009,22(10):1345-1359.
[20]GANIN Y,USTINOVA E,AJAKAN H,et al.Domain-adversarial training of neural networks[J].The journal of machine learning research,2016,17(1):2096-2030.
[21]JIANG J,CHEN B,WANG J,et al.Decoupled adaptation for cross-domain object detection[J].arXiv:2110.02578,2021.
[22]VAN DER AALST W M P,RUBIN V,VERBEEK H M W,et al.Process mining:a two-step approach to balance between underfitting and overfitting[J].Software & Systems Modeling,2010,9:87-111.
[23]YU F,CHEN H,WAN G X,et al.Bdd100k:A diverse driving dataset for heterogeneous multitask learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:2636-2645.
[24]HAN J,LIANG X,XU H,et al.SODA10M:a large-scale 2Dself/Semi-supervised object detection dataset for autonomous driving[J].arXiv:2106.11118,2021.
[25]TARVAINEN A,HARRI V.Mean teachers are better rolemodels:Weight-averaged consistency targets improve semi-supervised deep learning results[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:1195-1204.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

跨模态噪声过滤的事件相机目标检测算法

Event-based Camera Object Detection Algorithm for Cross-modal Noisy Annotations Filtering

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0