计算机科学 ›› 2023, Vol. 50 ›› Issue (6A): 220400199-7.doi: 10.11896/jsjkx.220400199

• 图像处理&多媒体技术 • 上一篇    下一篇

联合人体姿态估计和多目标跟踪的跨数据集学习

曾泽华, 罗会兰   

  1. 江西理工大学信息工程学院 江西 赣州 341000
  • 出版日期:2023-06-10 发布日期:2023-06-12
  • 通讯作者: 曾泽华(deemozen@163.com)
  • 基金资助:
    国家自然科学基金(61862031);江西省主要学科学术和技术带头人培养计划——领军人才项目(20213BCJ22004)

Cross-dataset Learning Combining Multi-object Tracking and Human Pose Estimation

ZENG Zehua, LUO Huilan   

  1. School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou,Jiangxi 341000,China
  • Online:2023-06-10 Published:2023-06-12
  • About author:ZENG Zehua,born in 1995,postgra-duate.His main research interests include multi-object tracking and human pose estimation. LUO Huilan,born in 1974,Ph.D,professor.Her main research interests include machine learning and pattern re-cognition.
  • Supported by:
    National Natural Science Foundation of China(61862031) and Jiangxi Provincial Foundation for Leaders of Disciplines in Science,Leading Talents Program(20213BCJ22004).

摘要: 近年来,多目标跟踪任务获得了较大的进展,尤其是针对行人的多目标跟踪。通过对行人进行联合姿态估计,能够提升多目标跟踪算法对行人的运动预测,同时为更高阶的任务例如自动驾驶算法提供更多的信息。然而,在当前包含人体姿态估计标签的多目标跟踪数据集中,视频长度较短且目标稀疏,限制了多目标跟踪算法的研究。文中使用具有更多行人的多目标跟踪数据集MOT17和多人姿态估计数据集COCO进行跨数据集学习,基于循环训练策略有效提升了联合人体姿态估计下的多目标跟踪算法的性能。同时极化自注意力下采样和注意力上采样的使用,在提升算法训练速度的同时,增强了算法的人体姿态估计性能。

关键词: 多目标跟踪, 跨数据集学习, 人体姿态估计, 注意力机制

Abstract: In recent years,multi-object tracking has gained significant progress,especially for pedestrians.By performing joint pose estimation on pedestrians,it is possible to improve the motion prediction of pedestrians by multi-object tracking algorithms,while providing more information for higher-order tasks such as autonomous driving.However,in the current multi-object tra-cking dataset containing human pose estimation labels,the video length is short and the targets are sparse,limits the research of multi-object tracking.In the paper,cross-dataset learning is performed using the multi-object tracking dataset MOT17 and the multi-human pose estimation dataset COCO with more pedestrians.The performance of the multi-object tracking algorithm under joint human pose estimation is effectively improved based on a round-robin training strategy.The use of simultaneous polarized self-attention down-sampling and attention up-sampling enhances the human pose estimation performance of the algorithm while improving the algorithm training speed.

Key words: Multi-object tracking, Cross-dataset learning, Human pose estimation, Attention mechanism

中图分类号: 

  • TP183
[1]MILAN A,LEAL-TAIXÉ L,REID I,et al.MOT16:A benchmark for multi-object tracking[EB/OL]arXiv:1603.00831,2016,Accessed:Aug.23,2021.https://arxiv.org/abs/1603.00831v2.
[2]DENDORFER P,REZATOFIGHI H,MILAN A,et al.Mot20:A benchmark for multi object tracking in crowded scenes[EB/OL].arXiv:2003.09003,2020.http://arxiv.org/abs/2003.09003.
[3]SHAO S,ZHAO Z,LI B,et al.Crowdhuman:A benchmark for detecting human in a crowd[EB/OL].arXiv:1805.00123,2018.http://arxiv.org/abs/1805.00123.
[4]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European Conference on Computer Vision.Cham:Springer,2014:740-755.
[5]LIANG J,JIANG L,NIEBLES J C,et al.Peeking into the future:Predicting future person activities and locations in videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:5725-5734.
[6]GIRDHAR R,GKIOXARI G,TORRESANI L,et al.Simple,efficient and effective keypoint tracking[C]//ICCV PoseTrack Workshop.2017.
[7]ZHOU X,KOLTUN V,KRÄHENBÜHL P.Tracking objectsas points[C]//European Conference on Computer Vision.Cham:Springer,2020:474-490.
[8]LIU H,LIU F,FAN X,et al.Polarized self-attention:Towards high-quality pixel-wise regression[EB/OL].arXiv:2107.00782,2021.http://arxiv.org/abs/2107.00782.
[9]WOJKE N,BEWLEY A,PAULUS D.Simple online and realtime tracking with a deep association metric[C]//2017 IEEE International Conference on Image Processing(ICIP).IEEE,2017:3645-3649.
[10]ZHOU X,WANG D,KRÄHENBÜHL P.Objects as points[EB/OL].arXiv:1904.07850,2019.http://arxiv.org/abs/1904.07850.
[11]CAO Z,SIMON T,WEI S E,et al.Realtime multi-person 2dpose estimation using part affinity fields[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7291-7299.
[12]HIDALGO G,RAAJ Y,IDREES H,et al.Single-networkwhole-body pose estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:6982-6991.
[13]FANG H S,XIE S,TAI Y W,et al.Rmpe:Regional multi-person pose estimation[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2334-2343.
[14]PAPANDREOU G,ZHU T,KANAZAWA N,et al.Towards accurate multi-person pose estimation in the wild[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4903-4911.
[15]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141.
[16]WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:3-19.
[17]DAI J,QI H,XIONG Y,et al.Deformable convolutional net-works[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:764-773.
[18]QI T,BAYRAMLI B,ALI U,et al.Spatial shortcut network for human pose estimation[EB/OL].arXiv:1904.03141,2019.http://arxiv.org/abs/1904.03141.
[19]YAO Y,WANG Y,GUO Y,et al.Cross-dataset training forclass increasing object detection[EB/OL].arXiv:2001.04621,2020.http://arxiv.org/abs/2001.04621.
[20]PERRETT T,DAMEN D.Recurrent assistance:cross-datasettraining of LSTMs on kitchen tasks[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops.2017:1354-1362.
[21]WANG L,LI D,LIU H,et al.Cross-Dataset CollaborativeLearning for Semantic Segmentation in Autonomous Driving[EB/OL].arXiv:2103.11351,2021.http://arxiv.org/abs/2103.11351.
[22]WANG X,CAI Z,GAO D,et al.Towards universal object detection by domain attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:7289-7298.
[23]TOKMAKOV P,LI J,BURGARD W,et al.Learning to trackwith object permanence[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10860-10869.
[24]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2980-2988.
[25]GIRSHICK R.Fast r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1440-1448.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!