计算机科学 ›› 2021, Vol. 48 ›› Issue (4): 123-129.doi: 10.11896/jsjkx.200800164

• 计算机图形学&多媒体 • 上一篇    下一篇

基于时空注意力机制的目标跟踪算法

程旭1,2, 崔一平1,2, 宋晨1,2, 陈北京1,2, 郑钰辉1,2, 史金钢3   

  1. 1 南京信息工程大学计算机与软件学院 南京210044
    2 数字取证教育部工程研究中心南京信息工程大学 南京210044
    3 西安交通大学软件学院 西安710049
  • 收稿日期:2020-06-24 修回日期:2020-10-15 出版日期:2021-04-15 发布日期:2021-04-09
  • 通讯作者: 程旭(xcheng@nuist.edu.cn)
  • 基金资助:
    国家自然科学基金(61802058,61911530397,62072251);中国博士后科学基金项目(2019M651650);南京信息工程大学人才启动经费(2018r057)

Object Tracking Algorithm Based on Temporal-Spatial Attention Mechanism

CHENG Xu1,2, CUI Yi-ping1,2, SONG Chen1,2, CHEN Bei-jing1,2, ZHENG Yu-hui1,2, SHI Jin-gang3   

  1. 1 School of Computer and Software,Nanjing University of Information Science and Technology,Nanjing 210044,China
    2 Engineering Research Center of Digital Forensics,Ministry of Education,Nanjing University of Information Science and Technology,Nanjing 210044,China
    3 School of Software Engineering,Xi’an Jiaotong University,Xi’an 710049,China
  • Received:2020-06-24 Revised:2020-10-15 Online:2021-04-15 Published:2021-04-09
  • About author:CHENG Xu,born in 1983,Ph.D,asso-ciate professor.His main research in-terests include computer vision,object tracking and pattern recognition.
  • Supported by:
    National Natural Science Foundation of China(61802058,61911530397,62072251), Postdoctoral Research Foundation of China(2019M651650) and Startup Foundation for Introducing Talent of Nanjing University of Information Science and Technology(2018r057).

摘要: 目标跟踪技术在智能监控、人机交互、无人驾驶等诸多领域得到了广泛的应用。近年来,学者们提出了许多高效的算法。然而,随着跟踪环境越来越复杂,目标跟踪算法在遮挡、光照变化、背景干扰等复杂环境下仍然面临着巨大的挑战,从而导致目标跟踪失败。针对上述问题,提出了一种基于时空注意力机制的目标跟踪算法。首先,采用孪生网络架构来提高对特征的判别能力;然后,引入改进的通道注意力机制和空间注意力机制,对不同通道和空间位置的特征施加不同的权重,并着重关注空间位置和通道位置上对目标跟踪有利的特征。此外,还提出了一种高效的目标模板在线更新机制,将第一帧图像特征与后续跟踪图像帧中置信度较高的图像特征进行融合,以降低发生目标漂移的风险。最后,在OTB2013和OTB2015数据集上对所提跟踪算法进行了测试。实验结果表明,所提算法的性能相比当前主流的跟踪算法提高了6.3%。

关键词: 孪生网络, 模板更新, 目标跟踪, 深度学习, 注意力机制

Abstract: Object tracking technology is widely used in intelligent monitoring,human-computer interaction,unmanned driving and many other fields.In recent years,many efficient tracking methods are proposed.However,object tracking methods still face great challenges in the complex scenario such as occlusion,illumination variations,background clutter,which leads to tracking failure.To solve the above mentioned problems,in this paper,an effective object tracking algorithm is proposed based on temporal-spatial attention mechanism.Firstly,we utilize the Siamese network architecture to improve the discriminative ability of object features.Then,the improved channel attention module and spatial attention module are introduced into the Siamese network,which imposes different weights on the features of different channels and spatial positions and focuses on the features that are beneficial to object tracking in spatial and channel positions.In addition,an efficient online object template updating mechanism is developed,which combines the features of the first frame and the features of the following frames with high confidence to reduce the risk of the object drift.Finally,the proposed tracking algorithm is tested on OTB2013 and OTB2015 benchmarks.Experimental results show that the performance of the proposed algorithm improves by 6.3% compared with the current mainstream tracking algorithms.

Key words: Attention mechanism, Deep learning, Object tracking, Siamese network, Template update

中图分类号: 

  • TP301.6
[1]YI W,LIM J,YANG M H.Online object tracking:A benchmark[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2013:2411-2418.
[2]WU Y,LIM J,YANG M H.Object tracking benchmark[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1834-1848.
[3]FIAZ M,MAHMOOD A,JAVED S,et al.Handcrafted and deep trackers:Recent visual object tracking approaches and trends[J].ACM Computing Surveys(CSUR),2019,52(2):1-44.
[4]LI P X,WANG D,WANG L J,et al.Deep visual tracking:Review and experimental comparison[J].Pattern Recognition,2018,76:323-338.
[5]BOLME D S,BEVERIDGE J R,DRAPER B A,et al.Visual object tracking using adaptive correlation filters[C]//The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2010:2544-2550.
[6]HENRIQUES J F,CASEIRO R,MARTINS P,et al.Exploiting the circulant structure of tracking-by-detection with kernels[C]//European Conference on Computer Vision(ECCV).2012:702-715.
[7]HENRIQUES J F,CASEIRO R,BATISTA J.High speedtracking with kernelized correlation filters[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(3):583-596.
[8]LI Y,ZHU J.A scale adaptive kernel correlation filter tracker with feature integration[C]//European Conference onCompu-ter Vision(ECCV).2014:254-265.
[9]DANELLJAN M,HÄGER G,KHAN F,et al.Accurate scaleestimation for robust visual tracking[C]//Proceeding of the British Machine Vision Conference(BMVC).2014.
[10]DANELLJAN M,ROBINSON A,KHAN F S,et al.BeyondCorrelation Filters:Learning Continuous Convolution Operators for Visual Tracking[C]//European Conference on Computer Vision(ECCV).2016:472-488.
[11]HUANG B,XU T,LI J,et al.Transfer learning-based discriminative correlation filter for visual tracking[J].Pattern Recognition,2019,100:107157.
[12]VALMADRE J,BERTINETTO L,HENRIQUES J,et al.End-to-end representation learning for correlation filter based tra-cking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017:2805-2813.
[13]DANELLJAN M,HAGER G,SHAHBAZ K,et al.Learningspatially regularized correlation filters for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision(ICCV).2015:4310-4318.
[14]WANG N,YEUNG D Y.Learning a deep compact image representation for visual tracking[C]//The Annual Conference on Neural Information Processing Systems(NIPS).2013:809-817.
[15]SONG Y,MA C,GONG L,et al.CREST:Convolutional residual learning for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2555-2563.
[16]DANELLJAN M,BHAT G,KHAN F S,et al.ECO:Efficient Convolution Operators for Tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017:6638-6646.
[17]WANG L,OU Y W,WANG X,et al.Visual tracking with fully convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision(ICCV).2015:3119-3127.
[18]MA C,HUANG J,YANG X,et al.Hierarchical Convolutional Features for Visual Tracking[C]//Proceedings of the IEEE International Conference on Computer Vision(ICCV).2015:3074-3082.
[19]TAO R,GAVVES E,SMEULDERS A W,et al.Siamese In-stance Search for Tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2016:1420-1429.
[20]LUCA B.Fully-convolutional Siamese networks for object tra-cking[C]//European Conference on Computer Vision(ECCV).2016:850-865.
[21]ZHANG Z,PENG H.Deeper and Wider Siamese Networks for Real-Time Visual Tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2019:4591-4600.
[22]WANG X,LI C,LUO B,et al.SINT++:Robust visual tra-cking via adversarial positive instance generation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2018:4864-4873.
[23]LI B,YAN J,WU W,et al.High performance visual tracking with siamese region proposal network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2018:8971-8980.
[24]BERTINETTO L,VALMADRE J,GOLODETZ S,et al.Sta-ple:Complementary Learners for Real-Time Tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2016:1401-1409.
[25]NAM H,HAN B.Learning multi-domain convolutional neural networks for visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2016:4293-4302.
[26]SONG Y,MA C,WU X,et al.VITAL:Visual tracking via adversarial learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2018:8990-8999.
[27]CHENG X,ZHANG Y,ZHOU L,et al.Visual Tracking viaAuto-Encoder Pair Correlation Filter[J].IEEE Transactions on Industrial Electronics,2020,67(4):3288-3297.
[28]PARK E,BERG A C.Meta-tracker:Fast and Robust OnlineAdaptation for Visual Object Trackers[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:587-604.
[29]FAN H,LING H.Parallel Tracking and Verifying[J].IEEE Transactions on Image Processing,2019,28(8):4130-4144.
[30]LI B,WU W,WANG Q,et al.SiamRPN++:Evolution of Siamese Visual Tracking with Very Deep Networks[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2019:4282-4291.
[31]WANG Q,ZHANG M,XING J,et al.Do not Lose the Details:Reinforced Representation Learning for High Performance Vi-sual Tracking[C]//International Joint Conference on Artificial Intelligence(IJCAI).2018:985-991.
[32]LUKEZIC A,MATAS J,KRIATAN M.D3S - A Discriminative Single Shot Segmentation Tracker[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2020:7133-7142.
[33]WANG G,LUO C,SUN X,et al.Tracking by Instance Detection:A Meta-Learning Approach[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2020:6288-6297.
[34]VOIGTLAENDER P,LUITEN J,PHILIP T H S,et al.Siam R-CNN:Visual Tracking by Re-Detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2020:6578-6588.
[35]DANELLJAN M,BHAT G,KHAN F S,et al.ATOM:Accurate Tracking by Overlap Maximization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2019:4660-4669.
[36]CHENG X,ZHANG Y,ZHOU L,et al.Visual Tracking viaAuto-Encoder Pair Correlation Filter[J].IEEE Transactions on Industrial Electronics,2020,67(4):3288-3297.
[1] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[4] 戴禹, 许林峰.
基于文本行匹配的跨图文本阅读方法
Cross-image Text Reading Method Based on Text Line Matching
计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[5] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[6] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[7] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[8] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[9] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[10] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[11] 汪鸣, 彭舰, 黄飞虎.
基于多时间尺度时空图网络的交通流量预测模型
Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction
计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[12] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[13] 陈坤峰, 潘志松, 王家宝, 施蕾, 张锦.
基于双目叠加仿生的微换衣行人再识别
Moderate Clothes-Changing Person Re-identification Based on Bionics of Binocular Summation
计算机科学, 2022, 49(8): 165-171. https://doi.org/10.11896/jsjkx.210600140
[14] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[15] 沈祥培, 丁彦蕊.
多检测器融合的深度相关滤波视频多目标跟踪算法
Multi-detector Fusion-based Depth Correlation Filtering Video Multi-target Tracking Algorithm
计算机科学, 2022, 49(8): 184-190. https://doi.org/10.11896/jsjkx.210600004
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!