计算机科学 ›› 2023, Vol. 50 ›› Issue (10): 96-103.doi: 10.11896/jsjkx.220900075
王琳1, 刘哲1, 史殿习1,2,3, 周晨磊3, 杨绍武1, 张拥军2
WANG Lin1, LIU Zhe1, SHI Dianxi1,2,3, ZHOU Chenlei3, YANG Shaowu1, ZHANG Yongjun2
摘要: 目标跟踪是计算机视觉领域的一项基本研究问题。作为主流目标跟踪方法传感器,传统相机可以提供丰富的场景信息。但是由于受到采样原理的限制,传统相机在极端光照条件下会出现过曝光或欠曝光的问题,且在高速运动场景中存在运动模糊的现象。而事件相机是一种仿生传感器,它能够感知光照强度变化输出事件流,具有高动态范围、高时间分辨率等优点,但难以捕捉静态目标。受传统相机和事件相机的特性启发,提出了一种双模态融合的单目标跟踪方法,称为融合跟踪器(Fusion Tracker)。该方法通过特征增强的方式自适应地融合来自传统相机和事件相机数据中的视觉线索,同时设计一种基于注意力机制的特征匹配网络,将模板帧的目标线索与搜索帧相匹配,建立长期特征关联,使跟踪器关注目标信息。融合跟踪器可以解决特征匹配过程中相关性运算导致的语义丢失问题,提升目标跟踪的性能。在两个公开数据集上的实验展示了所提方法的优越性,并且通过消融实验验证了融合跟踪器中关键部分的有效性。融合跟踪器可以有效提升在复杂场景中目标跟踪任务的鲁棒性,为下游应用提供可靠的跟踪结果。
中图分类号:
[1]DONG X,SHEN J,SHAO L,et al.CLNet:A compact latent network for fast adjusting Siamese trackers[C]//European Conference on Computer Vision.Cham:Springer,2020:378-395. [2]DANELLJAN M,GOOL L V,TIMOFTE R.Probabilistic re-gression for visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:7183-7192. [3]CHENG X,CUI Y P,SONG C,et al.Target tracking algorithm based on spatio-temporal attention mechanism [J].Computer Science,2021,48(4):123-129. [4]ZHAO Y,YU Z B,LI Y C.A twin tracking algorithm based on mutual attention guidance[J].Computer Science,2022,49(3):163-169. [5]GALLEGO G,DELBRUCK T,ORCHARD G,et al.Event-based vision:A survey[J].IEEE Transactions on Pattern Ana-lysis and Machine Intelligence,2020,44(1):154-180. [6]LICHTSEINER P,POSCH C,DELBRUCK T.A 128×128120 dB 15 μs Latency Asynchronous Temporal Contrast Vision Sensor[J].IEEE Journal of Solid-State Circuits,2008,43(2):566-576. [7]PIATKOWSKA E,BELBACHIR A N,SCHRAML S,et al.Spatiotemporal multiple persons tracking using dynamic vision sensor[C]//2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.IEEE,2012:35-40. [8]BARRANCO F,FERMULLER C,ROS E.Real-time clustering and multi-target tracking using event-based sensors[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2018:5764-5769. [9]MOEYS D P,CORRADI F,LI C,et al.A sensitive dynamic and active pixel vision sensor for color or neural imaging applications[J].IEEE Transactions on Biomedical Circuits and System,2017,12(1):123-126. [10]GEHRIG D,REBECQ H,GALLEGO G,et al.EKLT:Asyn-chronous photometric feature tracking using events and frames[J].International Journal of Computer Vision,2020,128(3):601-618. [11]LI B,YAN J,WU W,et al.High performance visual trackingwith siamese region proposal network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8971-8980. [12]SONG Y,MA C,WU X,et al.Vital:Visual tracking via adversarial learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8990-8999. [13]WANG X,LI C,LUO B,et al.Sint++:Robust visual tracking via adversarial positive instance generation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4864-4873. [14]GUO Q,FENG W,CHEN Z,et al.Effects of blur and deblurring to visual object tracking[J].arXiv:1908.07904,2019. [15]GALOOGAHI H K,FANG A,HUANG C,et al.Need forspeed:A benchmark for higher frame rate object tracking[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:1125-1134. [16]LI C,LIANG X,LU Y,et al.RGB-T object tracking:Benchmark and baseline[J].Pattern Recognition,2019,96:106977. [17]LUKEZIC A,KART U,KAPYLA J,et al.Cdtb:A color anddepth visual object tracking dataset and benchmark[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:10013-10022. [18]MITROKHIM A,FERMULLER C,PARAMESHWARA C,et al.Event-based moving object detection and tracking[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2018:1-9. [19]CHEN H,SUTER D,WU Q,et al.End-to-end learning of object motion estimation from retinal events for event-based object tracking[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020,34(7):10534-10541. [20]WANG X,LI J,ZHU L,et al.VisEvent:Reliable Object Tra-cking via Collaboration of Frame and Event Flows[J].arXiv:2108.05015,2021. [21]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll You Need[J].arXiv:1706.03762,2017. [22]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018. [23]LUSCHER C,BECK E,IRIE K,et al.RWTH ASR Systems for LibriSpeech:Hybrid vs Attention--w/o Data Augmentation[J].arXiv:1905.03072,2019. [24]SYNNAEVE G,XU Q,KAHN J,et al.End-to-end asr:from su-pervised tosemi-supervised learning with modern architectures[J].arXiv:1911.08460,2019. [25]PARMAR N,VASWANI A,USZKOREIT J,et al.Image transformer[C]//International Conference on Machine Learning.PMLR,2018:4055-4064. [26]CARION N,MASSA F,SYNNAEVE G,et al.End-to-end object detection with transformers[C]//European Conference on Computer Vision.Cham:Springer,2020:213-229. [27]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European Conference on Computer Vision.Cham:Springer,2014:740-755. [28]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39(6):1137-1149. [29]CHEN H,SHUTER D,WU Q,et al.End-to-end learning of object motion estimation from retinal events for event-based object tracking[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:10534-10541. [30]CHEN H,WU Q,LIANG Y,et al.Asynchronous tracking-by-detection on adaptive time surfaces for event-based object tra-cking[C]//Proceedings of the 27th ACM International Confe-rence on Multimedia.2019:473-481. [31]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [32]ZHANG J,YANG X,FU Y,et al.Object tracking by jointly exploiting frame and event domain[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:13043-13052. [33]YANG C,LAMDOUAR H,LU E,et al.Self-supervised videoobject segmentation by motion grouping[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:7177-7188. [34]CHEN X,YAN B,ZHU J,et al.Transformer tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:8126-8135. [35]UNION G I O.A Metric and a Loss for Bounding Box Regression[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR),Long Beach,CA,USA.2019:658-666. [36]RUSSAKOVSKY O,DENG J,SU H,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252. [37]GLOROT X,BENGIO Y.Understanding the difficulty of trai-ning deep feedforward neural networks[C]//Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.2010:249-256. [38]LOSHCHILOV I,HUTTER F.Decoupled weight decay regularization[J].arXiv:1711.05101,2017. [39]CHEN Z,ZHONG B,LI G,et al.Siamese box adaptive network for visual tracking[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2020:6668-6677. [40]LI B,YAN J,WU W,et al.High performance visual trackingwith siamese region proposal network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8971-8980. [41]DANELLJAN M,BHAT G,KHAN F S,et al.Atom:Accurate tracking byoverlap maximization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4660-4669. |
|