Computer Science ›› 2024, Vol. 51 ›› Issue (11A): 231000013-6.doi: 10.11896/jsjkx.231000013

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Event-based Camera Object Detection Algorithm for Cross-modal Noisy Annotations Filtering

HU Gang, LIANG Dong, HUANG Shengjun   

  1. School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:HU Gang,born in 1998,postgraduate.His main research interests include computer vision and machine learning.
    HUANG Shengjun,born in 1986,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.42916S).His main research interests include machine lear-ning and data mining.

Abstract: Event-based camera is commonly seen in object detection in limited scenarios for traditional camera applications (high speed,strong light,low light,etc.) due to their high time resolution,high dynamic range and low power consumption.However,the event sequence output of event camera is difficult to be manually labeled due to its pixel asynchronism,so the existing me-thods obtain event sequence annotations through the migration of RGB image annotations.However,since the migrated annotations have numerous inaccurate bounding boxes and some object textures in event sequence are fuzzy,leading to poor model performance.To address this problem,event-based camera object detection algorithm for cross-modal noisy annotations filtering is proposed.The method uses a pre-trained event-based camera detector to filter open-source RGB object detection datasets and selects RGB images that are most valuable for training the event-based camera detector.These selected RGB images are combined with event images to construct cross-domain mixed images,helping the detector to identify and locate the event image object more accurately.To mitigate the impact of noisy annotations on detector performance,a multi-stage object detection joint optimization strategy is designed.After each stage of training is completed,noisy annotations are identified in the global annotations and are corrected use in the next stage.Experimental results show that,on the 1Mpx Detection Dataset,the robust event-based camera cross-modal object detection method based on noisy annotations provides 8.35% model gain compared to the baseline model,significantly outperforming noise-label learning methods such as Co-teaching and O2U-net.Specifically,cross-modal hybrid images training and joint optimization frameworks offer model gains of 6.44% and 4.77%,respectively.

Key words: Event-based camera, Object detection, Noisy annotations, Cross-modal, Joint optimization

CLC Number: 

  • TP391.4
[1]WANG L,LIU Z,SHI D X,et al.Fusion Tracker:Single-object Tracking Framework Fusing Image Features and Event Features[J].Computer Science,2023,50(10):96-103.
[2]HAN J,YANG Y,ZHOU C,et al.EvIntSR-Net:Event guided multiple latent frames reconstruction and super-resolution[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:4882-4891.
[3]XU Q,DENG J,SHEN J R,et al.A Review of Image Reconstruction Based on Event Cameras[J].Journal of Electronics & Information Technology, 2023,45(8):2699-2709.
[4]LICHTSTEINERP,POSCH C,DelBruck T.A 128× 128 120 dB 15 μs Latency Asynchronous Temporal Contrast Vision Sensor[J].IEEE Journal of Solid-State Circuits,2008,43(2):566-576.
[5]GALLEGO G,DELBRÜCK T,ORCHARD G,et al.Event-based vision:A survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,44(1):154-180.
[6]SABATER A,MONTESANO L,MURILLO A C.Event Transformer.A sparse-aware solution for efficient event data proces-sing[C]//Proceedings of the IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition.2022:2677-2686.
[7]WAN J,XIA M,HUANG Z,et al.Event-Based Pedestrian Detection Using Dynamic Vision Sensors[J].Electronics,2021,10(8):888.
[8]MIAO S,CHEN G,NING X,et al.Neuromorphic vision datasets for pedestrian detection,action recognition,and fall detection[J].Frontiers in Neurorobotics,2019,13:38.
[9]HE D C,WANG L.Texture unit,texture spectrum,and texture analysis[J].IEEE transactions on Geoscience and Remote Sen-sing,1990,28(4):509-512.
[10]PEROT E,DE TOURNEMIRE P,NITTI D,et al.Learning to detect objects with a 1 megapixel event camera[J].Advances in Neural Information Processing Systems,2020,33:16639-16652.
[11]FINATEU T,NIWA A,MATOLIN D,et al.5.10 a 1280× 720 back-illuminated stacked temporal contrast event-based vision sensor with 4.86-m pixels,1.066 GEPS readout,programmable event-rate controller and compressive data-formatting pipeline[C]//2020 IEEE International Solid-State Circuits Conference(ISSCC).IEEE,2020.
[12]HUANG J,QU L,JIA R,et al.O2u-net:A simple noisy label detection approach for deep neural networks[C]//Proceedings of the IEEE/CVF Cnternational Conference on Computer Vision.2019:3326-3334.
[13]HAN B,YAO Q,YU X,et al.Co-teaching:Robust training of deep neural networks with extremely noisy labels[J].arXiv:1804.06872,2018.
[14]TANAKA D,IKAMI D,YAMASAKIT,et al.Joint optimization framework for learning with noisy labels[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5552-5560.
[15]LI J,XIONG C,SOCHER R,et al.Towards noise-resistant object detection with noisy annotations[J].arXiv:2003.01285,2020.
[16]BOCHKOVSKIY A,WANG C Y,LIAO H Y M.Yolov4:Optimal speed and accuracy of object detection[J].arXiv:2004.10934,2020.
[17]LIU K,QIAN X,WANG Z Q.Survey on active learning algorithms[J] Computer Engineering and Applications,2012,48(34):1-4.
[18]XIE Y,TOMIZUKA M,ZHAN W.Towards general and efficient active learning[J].arXiv:2112.07963,2021.
[19]PAN S J,YANG Q.A survey on transfer learning[J].IEEE Transactions on Knowledge and Data Engineering,2009,22(10):1345-1359.
[20]GANIN Y,USTINOVA E,AJAKAN H,et al.Domain-adversarial training of neural networks[J].The journal of machine learning research,2016,17(1):2096-2030.
[21]JIANG J,CHEN B,WANG J,et al.Decoupled adaptation for cross-domain object detection[J].arXiv:2110.02578,2021.
[22]VAN DER AALST W M P,RUBIN V,VERBEEK H M W,et al.Process mining:a two-step approach to balance between underfitting and overfitting[J].Software & Systems Modeling,2010,9:87-111.
[23]YU F,CHEN H,WAN G X,et al.Bdd100k:A diverse driving dataset for heterogeneous multitask learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:2636-2645.
[24]HAN J,LIANG X,XU H,et al.SODA10M:a large-scale 2Dself/Semi-supervised object detection dataset for autonomous driving[J].arXiv:2106.11118,2021.
[25]TARVAINEN A,HARRI V.Mean teachers are better rolemodels:Weight-averaged consistency targets improve semi-supervised deep learning results[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:1195-1204.
[1] WANG Jiahui, PENG Guangling, DUAN Liang, YUAN Guowu, YUE Kun. Few-shot Shadow Removal Method for Text Recognition [J]. Computer Science, 2024, 51(9): 147-154.
[2] LI Yunchen, ZHANG Rui, WANG Jiabao, LI Yang, WANG Ziqi, CHEN Yao. Re-parameterization Enhanced Dual-modal Realtime Object Detection Model [J]. Computer Science, 2024, 51(9): 162-172.
[3] HUANG Xiaofei, GUO Weibin. Multi-modal Fusion Method Based on Dual Encoders [J]. Computer Science, 2024, 51(9): 207-213.
[4] LIU Qian, BAI Zhihao, CHENG Chunling, GUI Yaocheng. Image-Text Sentiment Classification Model Based on Multi-scale Cross-modal Feature Fusion [J]. Computer Science, 2024, 51(9): 258-264.
[5] PU Bin, LIANG Zhengyou, SUN Yu. Monocular 3D Object Detection Based on Height-Depth Constraint and Edge Fusion [J]. Computer Science, 2024, 51(8): 192-199.
[6] LI Jiaying, LIANG Yudong, LI Shaoji, ZHANG Kunpeng, ZHANG Chao. Study on Algorithm of Depth Image Super-resolution Guided by High-frequency Information ofColor Images [J]. Computer Science, 2024, 51(7): 197-205.
[7] LOU Zhengzheng, ZHANG Xin, HU Shizhe, WU Yunpeng. Foggy Weather Object Detection Method Based on YOLOX_s [J]. Computer Science, 2024, 51(7): 206-213.
[8] ZHENG Shenhai, GAO Xi, LIU Pengwei, LI Weisheng. Occluded Video Instance Segmentation Method Based on Feature Fusion of Tracking and Detection in Time Sequence [J]. Computer Science, 2024, 51(6A): 230600186-6.
[9] LIU Hongli, WANG Yulin, SHAO Lei, LI Ji. Study on Monocular Vision Vehicle Ranging Based on Lower Edge of Detection Frame [J]. Computer Science, 2024, 51(6A): 231000077-6.
[10] CHEN Yuzhang, WANG Shiqi, ZHOU Wen, ZHOU Wanting. Small Object Detection for Fish Based on SPD-Conv and NAM Attention Module [J]. Computer Science, 2024, 51(6A): 230500176-7.
[11] QUE Yue, GAN Menghan, LIU Zhiwei. Object Detection with Receptive Field Expansion and Multi-branch Aggregation [J]. Computer Science, 2024, 51(6A): 230600151-6.
[12] LOU Ren, HE Renqiang, ZHAO Sanyuan, HAO Xin, ZHOU Yueqi, WANG Xinyuan, LI Fangfang. Single Stage Unsupervised Visible-infrared Person Re-identification [J]. Computer Science, 2024, 51(6A): 230600138-7.
[13] JIAO Ruodan, GAO Donghui, HUANG Yanhua, LIU Shuo, DUAN Xuanfei, WANG Rui, LIU Weidong. Study and Verification on Few-shot Evaluation Methods for AI-based Quality Inspection in Production Lines [J]. Computer Science, 2024, 51(6A): 230700086-8.
[14] LI Yuehao, WANG Dengjiang, JIAN Haifang, WANG Hongchang, CHENG Qinghua. LiDAR-Radar Fusion Object Detection Algorithm Based on BEV Occupancy Prediction [J]. Computer Science, 2024, 51(6): 215-222.
[15] LIAO Junshuang, TAN Qinhong. DETR with Multi-granularity Spatial Attention and Spatial Prior Supervision [J]. Computer Science, 2024, 51(6): 239-246.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!