计算机科学 ›› 2023, Vol. 50 ›› Issue (1): 138-146.doi: 10.11896/jsjkx.211000083

• 计算机图形学&多媒体 • 上一篇    下一篇

融合注意力特征的无锚框视觉目标跟踪方法

李雪辉1, 张拥军1, 史殿习1,2,3, 徐化池1, 史燕燕2   

  1. 1 国防科技创新研究院 北京 100071
    2 国防科技大学计算机学院 长沙 410073
    3 天津(滨海)人工智能创新中心 天津 300457
  • 收稿日期:2021-10-14 修回日期:2022-04-15 出版日期:2023-01-15 发布日期:2023-01-09
  • 通讯作者: 张拥军(yjzhang@nudt.edu.cn)
  • 作者简介:xhli_niidt@163.com
  • 基金资助:
    国家重点研发计划(2017YFB1001901);天津市滨海新区合作共建研发平台科技项目(BHXQKJXM-PT-RGZNJMZX-2019001)

AFTM:Anchor-free Object Tracking Method with Attention Features

LI Xuehui1, ZHANG Yongjun1, SHI Dianxi1,2,3, XU Huachi1, SHI Yanyan2   

  1. 1 National Innovation Institute of Defense Technology,Beijing 100071,China
    2 College of Computer,National University of Defense Technology,Changsha 410073,China
    3 Tianjin Artificial Intelligence Innovation Center,Tianjin 300457,China
  • Received:2021-10-14 Revised:2022-04-15 Online:2023-01-15 Published:2023-01-09
  • About author:LI Xuehui,born in 1997,postgraduate.Her main research interests include computer vision and object tracking.
    ZHANG Yongjun,born in 1966,Ph.D,professor.His main research interests include artificial intelligence,multi-agent cooperation,machine learning and feature recognition.
  • Supported by:
    National Key Research and Development Program of China(2017YFB1001901)and Science and Technology Commission of Tianjin Binhai New Area(BHXQKJXM-PT-RGZNJMZX-2019001).

摘要: 目标跟踪作为计算机视觉领域的一个重要分支,在智能视频监控、人机交互和自动驾驶等诸多领域具有很高的研究价值。尽管目标跟踪近年来已取得较好的发展,但在复杂跟踪环境下,遮挡、目标形变、光照变化等因素仍会导致跟踪精度下降,跟踪性能不稳定。因此,提出了一种融合注意力特征的无锚框视觉目标跟踪方法(Anchor-Free object Tracking Method,AFTM)。首先,在分类和回归过程中构建自适应生成的注意力权重因子组,实现了一种高效的自适应响应图融合策略,提高了目标定位和边界框尺度计算的准确性;其次,针对数据集中样本类别不均衡的现象,使用可动态缩放的交叉熵损失作为目标定位网络的损失函数,修正模型的优化方向,使跟踪性能更加稳定可靠;最后,设计相应的学习率调整策略,对一定数量的模型进行随机权重平均,增强模型的泛化能力。公开数据集上的实验结果表明,在复杂跟踪环境下,AFTM具有更高的精度和更稳定的跟踪效果。

关键词: 深度学习, 目标跟踪, 孪生网络, 无锚框, 注意力机制

Abstract: As an important branch in the field of computer vision,object tracking has been widely used in many fields such as intelligent video surveillance,human-computer interaction and autonomous driving.Although object tracking has achieved good development in recent years,tracking in complex environment is still a challenge.Due to problems such as occlusion,object deformation and illumination change,tracking performance will be inaccurate and unstable.In this paper,an effective object tracking method AFTM,is proposed with attention features.Firstly,this paper constructs an adaptively generated attention weight factor group,which implements an efficient adaptive fusion strategy for response map to improve the accuracy of object positioning and bounding box scale calculation in the process of classification and regression.Secondly,aiming at the class imbalance in the data set,the proposed method uses the dynamically scaled cross entropy loss as the loss function of the object positioning network,which can modify the optimization direction of the model and make the tracking performance more stable and reliable.Finally,this paper designs a corresponding learning rate adjustment strategy to stochastically average the weight of a number of models,which can enhance the generalization ability of the model.Experimental results on public data sets show that the proposed method has higher accuracy and more stable tracking performance in complex tracking environment.

Key words: Deep learning, Object tracking, Siamese network, Anchor-free, Attention mechanism

中图分类号: 

  • TP391.41
[1]LI X,ZHA Y F,ZHANG T Z,et al.Survey of visual objecttracking algorithms based on deep learning[J].Journal of Image and Graphics,2019,24(12):2057-2080.
[2]LU H C,LI P X,WANG D.Visual Object Tracking:A Survey[J].Pattern Recognition and Artificial Intelligence,2018,31(1):61-76.
[3]BERTINETTO L,VALMADRE J,HENRIQUES J F,et al.Fully-convolutional siamese networks for object tracking[C]//European Conference on Computer Vision.Cham:Springer,2016:850-865.
[4]VALMADRE J,BERTINETTO L,HENRIQUES J,et al.End-to-end representation learning for correlation filter based tra-cking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2805-2813.
[5]LI B,YAN J,WU W,et al.High performance visual tracking with siamese region proposal network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8971-8980.
[6]ZHU Z,WANG Q,LI B,et al.Distractor-aware siamese net-works for visual object tracking[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:101-117.
[7]LI B,WU W,WANG Q,et al.Siamrpn++:Evolution of siamese visual tracking with very deepnetworks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4282-4291.
[8]ZHANG Z,PENG H.Deeper and wider siamese networks forreal-time visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4591-4600.
[9]WANG Q,ZHANG L,BERTINETTO L,et al.Fast online object tracking and segmentation:A unifying approach[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:1328-1338.
[10]ZHANG L,GONZALEZ-GARCIA A,WEIJER J,et al.Learning the model update for siamese trackers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:4010-4019.
[11]YU Y,XIONG Y,HUANG W,et al.Deformable siamese attention networks for visual object tracking[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2020:6728-6737.
[12]CHEN Z,ZHONG B,LI G,et al.Siamese box adaptive network for visual tracking[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2020:6668-6677.
[13]REN S,HE K,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[J].Advances in Neural Information Processing Systems,2015,28:91-99.
[14]REDMON J,FARHADI A.YOLO9000:better,faster,stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7263-7271.
[15]LAW H,DENG J.Cornernet:Detecting objects as paired keypoints[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:734-750.
[16]DUAN K,BAI S,XIE L,et al.Centernet:Keypoint triplets for object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:6569-6578.
[17]ZHOU X,WANG D,KRÄHENBÜHL P.Objects as points[J].arXiv:1904.07850,2019.
[18]TIAN Z,SHEN C,CHEN H,et al.Fcos:Fully convolutional one-stage object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:9627-9636.
[19]REZATOFIGHI H,TSOI N,GWAK J Y,et al.Generalized intersection over union:A metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:658-666.
[20]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141.
[21]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2980-2988.
[22]IZMAILOV P,PODOPRIKHIN D,GARIPOV T,et al.Averaging weights leads to wider optima and better generalization[J].arXiv:1803.05407,2018.
[23]RUSSAKOVSKY O,DENG J,SU H,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252.
[24]REAL E,SHLENS J,MAZZOCCHI S,et al.Youtube-boun-ding boxes:A large high-precision human-annotated data set for object detection in video[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:5296-5305.
[25]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European Conference on Computer Vision.Cham:Springer,2014:740-755.
[26]HUANG L,ZHAO X,HUANG K.Got-10k:A large high-diversity benchmark for generic object tracking in the wild[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,43(5):1562-1577.
[27]FAN H,LIN L,YANG F,et al.Lasot:A high-quality benchmark for large-scale single object tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:5374-5383.
[28]KRISTAN M,LEONARDIS A,MATAS J,et al.The sixth vi-sual object tracking vot2018 challenge results[C]//Proceedings of the European Conference on Computer Vision(ECCV) Workshops.2018:3-53.
[29]KRISTAN M,MATAS J,LEONARDIS A,et al.The seventhvisual object tracking vot2019 challenge results[C]//Procee-dings of the IEEE/CVF International Conference on Computer Vision Workshops.2019:2206-2241.
[30]SUN C,WANG D,LU H,et al.Correlation tracking via joint discrimination and reliability learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:489-497.
[31]BHAT G,JOHNANDER J,DANELLJAN M,et al.Unveilingthe power of deep tracking[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:483-498.
[32]XU T,FENG Z H,WU X J,et al.Learning adaptive discriminative correlation filters viatemporal consistency preserving spatial feature selection for robust visual object tracking[J].IEEE Transactions on Image Processing,2019,28(11):5596-5609.
[33]DANELLJAN M,BHAT G,KHAN F S,et al.Atom:Accurate tracking by overlap maximization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4660-4669.
[34]WANG G,LUO C,XIONG Z,et al.Spm-tracker:Series-parallel matching for real-time visual object tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3643-3652.
[1] 蔡肖, 陈志华, 盛斌.
基于移位窗口金字塔Transformer的遥感图像目标检测
SPT:Swin Pyramid Transformer for Object Detection of Remote Sensing
计算机科学, 2023, 50(1): 105-113. https://doi.org/10.11896/jsjkx.211100208
[2] 张婧媛, 王宏霞, 何沛松.
基于Transformer的多任务图像拼接篡改检测算法
Multitask Transformer-based Network for Image Splicing Manipulation Detection
计算机科学, 2023, 50(1): 114-122. https://doi.org/10.11896/jsjkx.211100269
[3] 王斌, 梁宇栋, 刘哲, 张超, 李德玉.
亮度自调节的无监督图像去雾与低光图像增强算法研究
Study on Unsupervised Image Dehazing and Low-light Image Enhancement Algorithms Based on Luminance Adjustment
计算机科学, 2023, 50(1): 123-130. https://doi.org/10.11896/jsjkx.211100058
[4] 陈云芳, 陆洋洋, 周鑫, 张伟.
基于互相关注意力的链式帧处理多目标跟踪算法
Multi-object Tracking Based on Cross-correlation Attention and Chained Frames
计算机科学, 2023, 50(1): 131-137. https://doi.org/10.11896/jsjkx.211100097
[5] 赵倩, 周冬明, 杨浩, 王长城.
残差注意力与多特征融合的图像去模糊
Image Deblurring Based on Residual Attention and Multi-feature Fusion
计算机科学, 2023, 50(1): 147-155. https://doi.org/10.11896/jsjkx.211100161
[6] 孙凯丽, 罗旭东, 罗有容.
预训练语言模型的应用综述
Survey of Applications of Pretrained Language Models
计算机科学, 2023, 50(1): 176-184. https://doi.org/10.11896/jsjkx.220800223
[7] 郑诚, 梅亮, 赵伊研, 张苏航.
基于双向注意力机制和门控图卷积网络的文本分类方法
Text Classification Method Based on Bidirectional Attention and Gated Graph Convolutional Networks
计算机科学, 2023, 50(1): 221-228. https://doi.org/10.11896/jsjkx.211100095
[8] 李小玲, 吴昊天, 周涛, 鲁辉.
一种基于强化学习的口令猜解模型
Password Guessing Model Based on Reinforcement Learning
计算机科学, 2023, 50(1): 334-341. https://doi.org/10.11896/jsjkx.211100001
[9] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[10] 戴禹, 许林峰.
基于文本行匹配的跨图文本阅读方法
Cross-image Text Reading Method Based on Text Line Matching
计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[11] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[12] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[13] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[14] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[15] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!