Computer Science ›› 2023, Vol. 50 ›› Issue (6A): 220400199-7.doi: 10.11896/jsjkx.220400199

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Cross-dataset Learning Combining Multi-object Tracking and Human Pose Estimation

ZENG Zehua, LUO Huilan   

  1. School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou,Jiangxi 341000,China
  • Online:2023-06-10 Published:2023-06-12
  • About author:ZENG Zehua,born in 1995,postgra-duate.His main research interests include multi-object tracking and human pose estimation. LUO Huilan,born in 1974,Ph.D,professor.Her main research interests include machine learning and pattern re-cognition.
  • Supported by:
    National Natural Science Foundation of China(61862031) and Jiangxi Provincial Foundation for Leaders of Disciplines in Science,Leading Talents Program(20213BCJ22004).

Abstract: In recent years,multi-object tracking has gained significant progress,especially for pedestrians.By performing joint pose estimation on pedestrians,it is possible to improve the motion prediction of pedestrians by multi-object tracking algorithms,while providing more information for higher-order tasks such as autonomous driving.However,in the current multi-object tra-cking dataset containing human pose estimation labels,the video length is short and the targets are sparse,limits the research of multi-object tracking.In the paper,cross-dataset learning is performed using the multi-object tracking dataset MOT17 and the multi-human pose estimation dataset COCO with more pedestrians.The performance of the multi-object tracking algorithm under joint human pose estimation is effectively improved based on a round-robin training strategy.The use of simultaneous polarized self-attention down-sampling and attention up-sampling enhances the human pose estimation performance of the algorithm while improving the algorithm training speed.

Key words: Multi-object tracking, Cross-dataset learning, Human pose estimation, Attention mechanism

CLC Number: 

  • TP183
[1]MILAN A,LEAL-TAIXÉ L,REID I,et al.MOT16:A benchmark for multi-object tracking[EB/OL]arXiv:1603.00831,2016,Accessed:Aug.23,2021.https://arxiv.org/abs/1603.00831v2.
[2]DENDORFER P,REZATOFIGHI H,MILAN A,et al.Mot20:A benchmark for multi object tracking in crowded scenes[EB/OL].arXiv:2003.09003,2020.http://arxiv.org/abs/2003.09003.
[3]SHAO S,ZHAO Z,LI B,et al.Crowdhuman:A benchmark for detecting human in a crowd[EB/OL].arXiv:1805.00123,2018.http://arxiv.org/abs/1805.00123.
[4]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European Conference on Computer Vision.Cham:Springer,2014:740-755.
[5]LIANG J,JIANG L,NIEBLES J C,et al.Peeking into the future:Predicting future person activities and locations in videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:5725-5734.
[6]GIRDHAR R,GKIOXARI G,TORRESANI L,et al.Simple,efficient and effective keypoint tracking[C]//ICCV PoseTrack Workshop.2017.
[7]ZHOU X,KOLTUN V,KRÄHENBÜHL P.Tracking objectsas points[C]//European Conference on Computer Vision.Cham:Springer,2020:474-490.
[8]LIU H,LIU F,FAN X,et al.Polarized self-attention:Towards high-quality pixel-wise regression[EB/OL].arXiv:2107.00782,2021.http://arxiv.org/abs/2107.00782.
[9]WOJKE N,BEWLEY A,PAULUS D.Simple online and realtime tracking with a deep association metric[C]//2017 IEEE International Conference on Image Processing(ICIP).IEEE,2017:3645-3649.
[10]ZHOU X,WANG D,KRÄHENBÜHL P.Objects as points[EB/OL].arXiv:1904.07850,2019.http://arxiv.org/abs/1904.07850.
[11]CAO Z,SIMON T,WEI S E,et al.Realtime multi-person 2dpose estimation using part affinity fields[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7291-7299.
[12]HIDALGO G,RAAJ Y,IDREES H,et al.Single-networkwhole-body pose estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:6982-6991.
[13]FANG H S,XIE S,TAI Y W,et al.Rmpe:Regional multi-person pose estimation[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2334-2343.
[14]PAPANDREOU G,ZHU T,KANAZAWA N,et al.Towards accurate multi-person pose estimation in the wild[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4903-4911.
[15]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141.
[16]WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:3-19.
[17]DAI J,QI H,XIONG Y,et al.Deformable convolutional net-works[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:764-773.
[18]QI T,BAYRAMLI B,ALI U,et al.Spatial shortcut network for human pose estimation[EB/OL].arXiv:1904.03141,2019.http://arxiv.org/abs/1904.03141.
[19]YAO Y,WANG Y,GUO Y,et al.Cross-dataset training forclass increasing object detection[EB/OL].arXiv:2001.04621,2020.http://arxiv.org/abs/2001.04621.
[20]PERRETT T,DAMEN D.Recurrent assistance:cross-datasettraining of LSTMs on kitchen tasks[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops.2017:1354-1362.
[21]WANG L,LI D,LIU H,et al.Cross-Dataset CollaborativeLearning for Semantic Segmentation in Autonomous Driving[EB/OL].arXiv:2103.11351,2021.http://arxiv.org/abs/2103.11351.
[22]WANG X,CAI Z,GAO D,et al.Towards universal object detection by domain attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:7289-7298.
[23]TOKMAKOV P,LI J,BURGARD W,et al.Learning to trackwith object permanence[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10860-10869.
[24]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2980-2988.
[25]GIRSHICK R.Fast r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1440-1448.
[1] ZHANG Yian, YANG Ying, REN Gang, WANG Gang. Study on Multimodal Online Reviews Helpfulness Prediction Based on Attention Mechanism [J]. Computer Science, 2023, 50(8): 37-44.
[2] TENG Sihang, WANG Lie, LI Ya. Non-autoregressive Transformer Chinese Speech Recognition Incorporating Pronunciation- Character Representation Conversion [J]. Computer Science, 2023, 50(8): 111-117.
[3] WANG Jiahao, ZHONG Xin, LI Wenxiong, ZHAO Dexin. Human Activity Recognition with Meta-learning and Attention [J]. Computer Science, 2023, 50(8): 193-201.
[4] WANG Yu, WANG Zuchao, PAN Rui. Survey of DGA Domain Name Detection Based on Character Feature [J]. Computer Science, 2023, 50(8): 251-259.
[5] YAN Mingqiang, YU Pengfei, LI Haiyan, LI Hongsong. Arbitrary Image Style Transfer with Consistent Semantic Style [J]. Computer Science, 2023, 50(7): 129-136.
[6] GAO Xiang, TANG Jiqiang, ZHU Junwu, LIANG Mingxuan, LI Yang. Study on Named Entity Recognition Method Based on Knowledge Graph Enhancement [J]. Computer Science, 2023, 50(6A): 220700153-6.
[7] ZHANG Tao, CHENG Yifei, SUN Xinxu. Graph Attention Networks Based on Causal Inference [J]. Computer Science, 2023, 50(6A): 220600230-9.
[8] CUI Lin, CUI Chenlu, LIU Zhengwei, XUE Kai. Speech Emotion Recognition Based on Improved MFCC and Parallel Hybrid Model [J]. Computer Science, 2023, 50(6A): 220800211-7.
[9] DUAN Jianyong, YANG Xiao, WANG Hao, HE Li, LI Xin. Document-level Relation Extraction of Graph Attention Convolutional Network Based onInter-sentence Information [J]. Computer Science, 2023, 50(6A): 220800189-6.
[10] YANG Xing, SONG Lingling, WANG Shihui. Remote Sensing Image Classification Based on Improved ResNeXt Network Structure [J]. Computer Science, 2023, 50(6A): 220100158-6.
[11] ZHANG Shunyao, LI Huawang, ZHANG Yonghe, WANG Xinyu, DING Guopeng. Image Retrieval Based on Independent Attention Mechanism [J]. Computer Science, 2023, 50(6A): 220300092-6.
[12] LIU Haowei, YAO Jingchi, LIU Bo, BI Xiuli, XIAO Bin. Two-stage Method for Restoration of Heritage Images Based on Muti-scale Attention Mechanism [J]. Computer Science, 2023, 50(6A): 220600129-8.
[13] LI Fan, JIA Dongli, YAO Yumin, TU Jun. Graph Neural Network Few Shot Image Classification Network Based on Residual and Self-attention Mechanism [J]. Computer Science, 2023, 50(6A): 220500104-5.
[14] SUN Kaiwei, WANG Zhihao, LIU Hu, RAN Xue. Maximum Overlap Single Target Tracking Algorithm Based on Attention Mechanism [J]. Computer Science, 2023, 50(6A): 220400023-5.
[15] WU Liuchen, ZHANG Hui, LIU Jiaxuan, ZHAO Chenyang. Defect Detection of Transmission Line Bolt Based on Region Attention Mechanism andMulti-scale Feature Fusion [J]. Computer Science, 2023, 50(6A): 220200096-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!