Computer Science ›› 2023, Vol. 50 ›› Issue (10): 96-103.doi: 10.11896/jsjkx.220900075

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Fusion Tracker:Single-object Tracking Framework Fusing Image Features and Event Features

WANG Lin1, LIU Zhe1, SHI Dianxi1,2,3, ZHOU Chenlei3, YANG Shaowu1, ZHANG Yongjun2   

  1. 1 School of Computer Science,National University of Defense Technology,Changsha 410073,China
    2 National Innovation Institute of Defense Technology,Academy of Military Sciences,Beijing 100166,China
    3 Tianjin Artificial Intelligence Innovation Center,Tianjin 300457,China
  • Received:2022-09-08 Revised:2022-12-09 Online:2023-10-10 Published:2023-10-10
  • About author:WANG Lin,born in 1998,postgraduate.His main research interests include event camera,deep learning and compu-ter vision.SHI Dianxi,born in 1966,Ph.D,professor,Ph.D supervisor.His main research interests include distributed object middleware technology,adaptive software technology,artificial intelligence,and robot operating systems.
  • Supported by:
    National Natural Science Foundation of China(91948303).

Abstract: Object tracking is a fundamental research problem in the field of computer vision.As the mainstream object tracking method sensor,conventional cameras can provide rich scene information.However,due to the limitation of sampling principle,conventional cameras suffer from overexposure or underexposure under extreme lighting conditions,and there is motion blur in high-speed motion scenes.In contrast,event camera is a bionic sensor that can sense light intensity changes to output event streams,with the advantages of high dynamic range and high temporal resolution,but it is difficult to capture static targets.Inspired by the characteristics of conventional and event cameras,a dual-modal fusion single-target tracking method,called fusion tracker,is proposed.The method adaptively fuses visual cues from conventional and event camera data by feature enhancement,while designing an attention mechanism-based feature matching network to match object cues of template frames with search frames to establish long-term feature associations and make the tracker focus on object information.The fusion tracker can solve the semantic loss problem caused by correlation operations during feature matching and improve the performance of object tra-cking.Experiments on two publicly available datasets demonstrate the superiority of our approach and validate the effectiveness of the key parts of the fusion tracker by ablation experiments.The fusion tracker can effectively improve the robustness of object tracking tasks in complex scenarios and provide reliable tracking results for downstream applications.

Key words: Object tracking, Deep learning, Event cameras, Featurefusion, Attention mechanisms

CLC Number: 

  • TP391
[1]DONG X,SHEN J,SHAO L,et al.CLNet:A compact latent network for fast adjusting Siamese trackers[C]//European Conference on Computer Vision.Cham:Springer,2020:378-395.
[2]DANELLJAN M,GOOL L V,TIMOFTE R.Probabilistic re-gression for visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:7183-7192.
[3]CHENG X,CUI Y P,SONG C,et al.Target tracking algorithm based on spatio-temporal attention mechanism [J].Computer Science,2021,48(4):123-129.
[4]ZHAO Y,YU Z B,LI Y C.A twin tracking algorithm based on mutual attention guidance[J].Computer Science,2022,49(3):163-169.
[5]GALLEGO G,DELBRUCK T,ORCHARD G,et al.Event-based vision:A survey[J].IEEE Transactions on Pattern Ana-lysis and Machine Intelligence,2020,44(1):154-180.
[6]LICHTSEINER P,POSCH C,DELBRUCK T.A 128×128120 dB 15 μs Latency Asynchronous Temporal Contrast Vision Sensor[J].IEEE Journal of Solid-State Circuits,2008,43(2):566-576.
[7]PIATKOWSKA E,BELBACHIR A N,SCHRAML S,et al.Spatiotemporal multiple persons tracking using dynamic vision sensor[C]//2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.IEEE,2012:35-40.
[8]BARRANCO F,FERMULLER C,ROS E.Real-time clustering and multi-target tracking using event-based sensors[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2018:5764-5769.
[9]MOEYS D P,CORRADI F,LI C,et al.A sensitive dynamic and active pixel vision sensor for color or neural imaging applications[J].IEEE Transactions on Biomedical Circuits and System,2017,12(1):123-126.
[10]GEHRIG D,REBECQ H,GALLEGO G,et al.EKLT:Asyn-chronous photometric feature tracking using events and frames[J].International Journal of Computer Vision,2020,128(3):601-618.
[11]LI B,YAN J,WU W,et al.High performance visual trackingwith siamese region proposal network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8971-8980.
[12]SONG Y,MA C,WU X,et al.Vital:Visual tracking via adversarial learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8990-8999.
[13]WANG X,LI C,LUO B,et al.Sint++:Robust visual tracking via adversarial positive instance generation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4864-4873.
[14]GUO Q,FENG W,CHEN Z,et al.Effects of blur and deblurring to visual object tracking[J].arXiv:1908.07904,2019.
[15]GALOOGAHI H K,FANG A,HUANG C,et al.Need forspeed:A benchmark for higher frame rate object tracking[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:1125-1134.
[16]LI C,LIANG X,LU Y,et al.RGB-T object tracking:Benchmark and baseline[J].Pattern Recognition,2019,96:106977.
[17]LUKEZIC A,KART U,KAPYLA J,et al.Cdtb:A color anddepth visual object tracking dataset and benchmark[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:10013-10022.
[18]MITROKHIM A,FERMULLER C,PARAMESHWARA C,et al.Event-based moving object detection and tracking[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2018:1-9.
[19]CHEN H,SUTER D,WU Q,et al.End-to-end learning of object motion estimation from retinal events for event-based object tracking[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020,34(7):10534-10541.
[20]WANG X,LI J,ZHU L,et al.VisEvent:Reliable Object Tra-cking via Collaboration of Frame and Event Flows[J].arXiv:2108.05015,2021.
[21]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll You Need[J].arXiv:1706.03762,2017.
[22]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[23]LUSCHER C,BECK E,IRIE K,et al.RWTH ASR Systems for LibriSpeech:Hybrid vs Attention--w/o Data Augmentation[J].arXiv:1905.03072,2019.
[24]SYNNAEVE G,XU Q,KAHN J,et al.End-to-end asr:from su-pervised tosemi-supervised learning with modern architectures[J].arXiv:1911.08460,2019.
[25]PARMAR N,VASWANI A,USZKOREIT J,et al.Image transformer[C]//International Conference on Machine Learning.PMLR,2018:4055-4064.
[26]CARION N,MASSA F,SYNNAEVE G,et al.End-to-end object detection with transformers[C]//European Conference on Computer Vision.Cham:Springer,2020:213-229.
[27]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European Conference on Computer Vision.Cham:Springer,2014:740-755.
[28]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39(6):1137-1149.
[29]CHEN H,SHUTER D,WU Q,et al.End-to-end learning of object motion estimation from retinal events for event-based object tracking[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:10534-10541.
[30]CHEN H,WU Q,LIANG Y,et al.Asynchronous tracking-by-detection on adaptive time surfaces for event-based object tra-cking[C]//Proceedings of the 27th ACM International Confe-rence on Multimedia.2019:473-481.
[31]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[32]ZHANG J,YANG X,FU Y,et al.Object tracking by jointly exploiting frame and event domain[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:13043-13052.
[33]YANG C,LAMDOUAR H,LU E,et al.Self-supervised videoobject segmentation by motion grouping[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:7177-7188.
[34]CHEN X,YAN B,ZHU J,et al.Transformer tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:8126-8135.
[35]UNION G I O.A Metric and a Loss for Bounding Box Regression[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR),Long Beach,CA,USA.2019:658-666.
[36]RUSSAKOVSKY O,DENG J,SU H,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252.
[37]GLOROT X,BENGIO Y.Understanding the difficulty of trai-ning deep feedforward neural networks[C]//Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.2010:249-256.
[38]LOSHCHILOV I,HUTTER F.Decoupled weight decay regularization[J].arXiv:1711.05101,2017.
[39]CHEN Z,ZHONG B,LI G,et al.Siamese box adaptive network for visual tracking[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2020:6668-6677.
[40]LI B,YAN J,WU W,et al.High performance visual trackingwith siamese region proposal network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8971-8980.
[41]DANELLJAN M,BHAT G,KHAN F S,et al.Atom:Accurate tracking byoverlap maximization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4660-4669.
[1] ZHAO Mingmin, YANG Qiuhui, HONG Mei, CAI Chuang. Smart Contract Fuzzing Based on Deep Learning and Information Feedback [J]. Computer Science, 2023, 50(9): 117-122.
[2] LI Haiming, ZHU Zhiheng, LIU Lei, GUO Chenkai. Multi-task Graph-embedding Deep Prediction Model for Mobile App Rating Recommendation [J]. Computer Science, 2023, 50(9): 160-167.
[3] WANG Luo, LI Biao, FU Ruigang. Infrared Ground Multi-object Tracking Method Based on Improved ByteTrack Algorithm [J]. Computer Science, 2023, 50(9): 176-183.
[4] HUANG Hanqiang, XING Yunbing, SHEN Jianfei, FAN Feiyi. Sign Language Animation Splicing Model Based on LpTransformer Network [J]. Computer Science, 2023, 50(9): 184-191.
[5] ZHU Ye, HAO Yingguang, WANG Hongyu. Deep Learning Based Salient Object Detection in Infrared Video [J]. Computer Science, 2023, 50(9): 227-234.
[6] WANG Yu, WANG Zuchao, PAN Rui. Survey of DGA Domain Name Detection Based on Character Feature [J]. Computer Science, 2023, 50(8): 251-259.
[7] ZHANG Yian, YANG Ying, REN Gang, WANG Gang. Study on Multimodal Online Reviews Helpfulness Prediction Based on Attention Mechanism [J]. Computer Science, 2023, 50(8): 37-44.
[8] SONG Xinyang, YAN Zhiyuan, SUN Muyi, DAI Linlin, LI Qi, SUN Zhenan. Review of Talking Face Generation [J]. Computer Science, 2023, 50(8): 68-78.
[9] WANG Xu, WU Yanxia, ZHANG Xue, HONG Ruize, LI Guangsheng. Survey of Rotating Object Detection Research in Computer Vision [J]. Computer Science, 2023, 50(8): 79-92.
[10] ZHOU Ziyi, XIONG Hailing. Image Captioning Optimization Strategy Based on Deep Learning [J]. Computer Science, 2023, 50(8): 99-110.
[11] ZHANG Xiao, DONG Hongbin. Lightweight Multi-view Stereo Integrating Coarse Cost Volume and Bilateral Grid [J]. Computer Science, 2023, 50(8): 125-132.
[12] LI Kun, GUO Wei, ZHANG Fan, DU Jiayu, YANG Meiyue. Adversarial Malware Generation Method Based on Genetic Algorithm [J]. Computer Science, 2023, 50(7): 325-331.
[13] WANG Mingxia, XIONG Yun. Disease Diagnosis Prediction Algorithm Based on Contrastive Learning [J]. Computer Science, 2023, 50(7): 46-52.
[14] SHEN Zhehui, WANG Kailai, KONG Xiangjie. Exploring Station Spatio-Temporal Mobility Pattern:A Short and Long-term Traffic Prediction Framework [J]. Computer Science, 2023, 50(7): 98-106.
[15] HUO Weile, JING Tao, REN Shuang. Review of 3D Object Detection for Autonomous Driving [J]. Computer Science, 2023, 50(7): 107-118.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!