Computer Science ›› 2022, Vol. 49 ›› Issue (12): 219-228.doi: 10.11896/jsjkx.210900041

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Research Progress of Deep Learning Methods in Two-dimensional Human Pose Estimation

ZHANG Guo-ping1,3, MA Nan2, Guan Huai-guang1, WU Zhi-xuan1   

  1. 1 Beijing Key Laboratory of Information Service Engineering,Beijing Union University,Beijing 100101,China
    2 Department of Information Science,Beijing University of Technology,Beijing 100124,China
    3 College of Robotics,Beijing Union University,Beijing 100101,China
  • Received:2021-09-06 Revised:2022-01-30 Published:2022-12-14
  • About author:ZHANG Guo-ping,born in 1995,master.His main research interests include human pose estimation,interactive cognition and action recognition.MA Nan,born in 1978,Ph.D,professor.Her main research interests include interactive cognition,intelligent driving,knowledge discovery and intelligent system.
  • Supported by:
    National Natural Science Foundation of China(61871038,61931012).

Abstract: The task of human pose estimation is to locate and detect the key points of human body in images or videos.It has always been one of the hot research directions in the field of computer vision,and it is also a key step for computers to understand human actions.In recent years,it has wide application for predicting the poses of two-dimensional human body key points in images and videos.Using the powerful image feature extraction capabilities of deep learning,two-dimensional human pose estimation has been improved in robustness,accuracy,and processing time,and the performance effect is far beyond traditional methods.According to the different number of objects in the two-dimensional human body pose,it can be divided into single-person and multi-person pose estimation methods.For single-person pose estimation,according to the different representations of the extracted key points,coordinate regression methods based on the direct prediction of human coordinate points and heat map detection methods based on predicting the Gaussian distribution of human key points can be used.In multi-person pose estimation,it is divided into the top-down method which solves the process from multiple people to a single person,and a bottom-up method that directly deals with the key points of multiple people.Based on the existing estimation methods of human body posture,this paper analyzes the internal mechanism of the network structure,analyzes the commonly used datasets and evaluation indicators,and elaborates the current problems and future development trends.

Key words: Two-dimensional human pose estimation, Deep learning, Single-person pose estimation, Multi-person pose estimation, Evaluation metrics

CLC Number: 

  • TP391
[1]CHEN L,MA N,PANG G L,et al.Research on multi-view data fusion and balanced YOLOv3 for pedestrian detection[J].CAAI Transactions on Intelligent Systems,2021,16(1):57-65.
[2]TAN M,LE Q V.EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks[C]//Proceedings of the 36th International Conference on Machine Learning.PMLR 97,2019:6105-6114.
[3]TOSHEV A,SZEGEDY C.DeepPose:Human Pose Estimation via Deep Neural Networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:1653-1660.
[4]CARREIRA J,AGRAWAL P,FRAGKIADAKI K,et al.Hu-man Pose Estimation with Iterative Error Feedback[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4733-4742.
[5]SUN X,SHANG J X,LIANG S,et al.Compositional Human Pose Regression[J].arXiv:1704.00159,2017.
[6]LUVIZON D C,TABIA H,PICARD D.Human Pose Regression by Combining Indirect Part Detection and Contextual Information [J].Computers & Graphics,2019,85:15-22.
[7]MAO W,GE Y,SHEN C,et al.TFPose:Direct Human Pose Estimation with Transformers[J].arXiv:2103.15320,2021.
[8]ZHANG H,OUYANG H,LIU S,et al.Human Pose Estimation with Spatial Contextual Information[J].arXiv:1901.01760,2019.
[9]ARTACHO B,SAVAKIS A.UniPose:Unified Human Pose Estimation in Single Images and Videos[C]//IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition(CVPR).IEEE,2020:7035-7044.
[10]LIFSHITZ I,FETAYA E,ULLMAN S.Human Pose Estimation using Deep Consensus Voting[C]//European Conference on Computer Vision.Cham:Springer,2016:246-260.
[11]WEI S E,RAMAKRISHNA V,KANADE T,et al.Convolutional Pose Machines[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4724-4732.
[12]NEWELL A,YANG K,JIA D.Stacked Hourglass Networks for Human Pose Estimation[C]//European Conference on Compu-ter Vision.Cham:Springer International Publishing,2016:483-499.
[13]YANG W,LI S,OUYANG W,et al.Learning Feature Pyramids for Human Pose Estimation[C]//IEEE Computer Society.IEEE Computer Society,2017:1281-1290.
[14]CHU X,YANG W,OUYANG W,et al.Multi-Context Atten-tion for Human Pose Estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1831-1840.
[15]WANG J,JIN S,LIU W,et al.When Human Pose Estimation Meets Robustness:Adversarial Algorithms and Benchmarks [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:11855-11864.
[16]GROOS D,RAMAMPIARO H,IHLEN E.EfficientPose:Scalable single-person pose estimation[J].Applied Intelligence,2021,51(4):2518-2533.
[17]WANG J,LONG X,GAO Y,et al.Graph-PCNN:Two Stage Human Pose Estimation with Graph Pose Refinement [C]//European Conference on Computer Vision.Cham:Springer,2020:492-508.
[18]HUANG J,ZHU Z,GUO F,et al.The Devil Is in the Details:Delving Into Unbiased Data Processing for Human Pose Estimation[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2020.
[19]CAI Y,WANG Z,LUO Z,et al.Learning Delicate Local Representations for Multi-Person Pose Estimation [C]//European Conference on Computer Vision.Cham:Springer,2020:455-472.
[20]ZHANG F,ZHU X,DAI H,et al.Distribution-Aware Coordinate Representation for Human Pose Estimation [C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:7093-7102.
[21]IQBAL U,GALL J.Multi-Person Pose Estimation with Local Joint-to-Person Associations[C]//European Conference on Computer Vision(ECCV) Workshops,Crowd Understanding,2016.Cham:Springer International Publishing,2016:627-642.
[22]FANG H S,XIE S,TAI Y W,et al.RMPE:Regional Multi-person Pose Estimation[C]//2017 IEEE International Conference on Computer Vision(ICCV).IEEE,2017:2334-2343.
[23]PAPANDREOU G,ZHU T,KANAZAWA N,et al.Towards Accurate Multi-person Pose Estimation in the Wild [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4903-4911.
[24]HUANG S,GONG M,TAO D.A Coarse-Fine Network forKeypoint Localization[C]//2017 IEEE International Confe-rence on Computer Vision(ICCV).IEEE,2017:3028-3037.
[25]KUMAR C,RAMESH J,CHAKRABORTY B,et al.VRUPose-SSD:Multiperson Pose Estimation For Automated Driving[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:15331-15338.
[26]CHEN Y,WANG Z,PENG Y,et al.Cascaded Pyramid Network for Multi-person Pose Estimation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2018:7103-7112.
[27]SU K,YU D,XU Z,et al.Multi-Person Pose Estimation with Enhanced Channel-wise and Spatial Information[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2019:5674-5682.
[28]QIU L,ZHANG X,LI Y,et al.Peeking into occluded joints:A novel framework for crowd pose estimation [C]//European Conference on Computer Vision.Cham:Springer,2020:488-504.
[29]SUN K,XIAO B,LIU D,et al.Deep High-Resolution Representation Learning for Human Pose Estimation [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:5693-5703.
[30]WANG X,TONG J,WANG R.Attention Refined Network for Human Pose Estimation[J].Neural Processing Letters,2021(4):1-20.
[31]IRDHAR R,GKIOXARI G,TORRESANI L,et al.Detect-and-Track:Efficient Pose Estimation in Videos[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2018:350-359.
[32]WANG M,TIGHE J,MODOLO D.Combining detection andtracking for human pose estimation in videos[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2020:11088-11096.
[33]XIAO B,WU H,WEI Y.Simple Baselines for Human Pose Estimation and Tracking [C]//Proceedings of the European Confe-rence on Computer Vision(ECCV).2018:466-481.
[34]BAO Q,LIU W,CHENG Y,et al.Pose-Guided Tracking-by-Detection:Robust Multi-Person Pose Tracking [J].IEEE Transactions on Multimedia,2020,23:161-175.
[35]RUAN W,LIU W,BAO Q,et al.POINet:Pose-Guided Ovonic Insight Network for Multi-Person Pose Tracking [C]//Procee-dings of the 27th ACM International Conference on Multimedia.2019:284-292.
[36]UMER R,DOERING A,LEIBE B,et al.Self-supervised Key-point Correspondences for Multi-Person Pose Estimation and Tracking in Videos [J].arXiv:2004.12652,2020.
[37]SNOWER M,KADAV A,LAI F,et al.15 Keypoints Is All You Need[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:6738-6748.
[38]YANG L P,SUN Y B,ZHANG H L,et al.Human KeypointMatching Network Based on Encoding and Decoding Residuals[J].Computer Science,2020,47(6):114-120.
[39]JIN S,LIU W,XIE E,et al.Differentiable Hierarchical GraphGrouping for Multi-Person Pose Estimation [C]//European Conference on Computer Vision.Cham:Springer,2020:718-734.
[40]CHENG B,XIAO B,WANG J,et al.HigherHRNet:Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5386-5395.
[41]CAO Z,SIMON T,WEI S E,et al.Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7291-7299.
[42]YU D,SU K,SUN J,et al.Multi-person Pose Estimation for Pose Tracking with Enhanced Cascaded Pyramid Network[C]//European Conference on Computer Vision.Cham:Springer,2018:221-226.
[43]KREISS S,BERTONI L,ALAHI A.PifPaf:Composite Fields for Human Pose Estimation [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:11977-11986.
[44]NEWELL A,HUANG Z,DENG J.Associative Embedding:End-to-End Learning for Joint Detection and Grouping[J].ar-Xiv:1611.05424,2016.
[45]PAPANDREOU G,ZHU T,CHEN L C,et al.PersonLab:Person Pose Estimation and Instance Segmentation with a Bottom-Up,Part-Based,Geometric Embedding Model [C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:269-286.
[46]KOCABAS M,KARAGOZ S,AKBAS E.MultiPoseNet:FastMulti-Person Pose Estimation using Pose Residual Network [C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:417-433.
[47]INSAFUTDINOV E,ANDRILUKA M,PISHCHULIN L,et al.ArtTrack:Articulated Multi-Person Tracking in the Wild [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6457-6465.
[48]XIU Y,LI J,WANG H,et al.Pose Flow:Efficient Online Pose Tracking[J].arXiv:1802.00977,2018.
[49]ZHANG Z,WANG C,QIN W.Semantically Synchronizing Multiple-Camera Systems with Human Pose Estimation[J].Sensors,2021,21(7):2464.
[50]FABBRI M,LANZI F,CALDERARA S,et al.Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World [C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:430-446.
[51]HWANG J,LEE J,PARK S,et al.Pose estimator and trackerusing temporal flow maps for limbs [C]//2019 International Joint Conference on Neural Networks(IJCNN).IEEE,2019:1-8.
[52]RAAJ Y,IDREES H,HIDALGO G,et al.Efficient OnlineMulti-Person 2D Pose Tracking With Recurrent Spatio-Temporal Affinity Fields[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2019:4620-4628.
[53]JIN S,LIU W,OUYANG W,et al.Multi-Person Articulated Tracking With Spatial and Temporal Embeddings [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:5664-5673.
[54]HELMSTETTER S,SNGER J,GERMANN R,et al.How touse human pose estimation to measure the hand-arm motion in craft application with no influence on the natural user behavior[J].Procedia CIRP,2021,100:631-636.
[55]STENUM J,ROSSI C,ROEMMICH R T.Two-dimensional vi-deo-based analysis of human gait using pose estimation[J].PLoS Computational Biology,2021,17(4):e1008935.
[56]FANG H S,CAO J,TAI Y W,et al.Pairwise Body-Part Attention for Recognizing Human-Object Interactions [C]//Procee-dings of the European Conference on Computer Vision(ECCV).2018:51-67.
[57]LI Y L,LIU X,WU X,et al.Transferable InteractivenessKnowledge for Human-Object Interaction Detection [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3585-3594.
[58]WAN B,ZHOU D,LIU Y,et al.Pose-aware Multi-level Feature Network for Human Object Interaction Detection [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:9469-9478.
[59]LUVIZON D C,PICARD D,TABIA H.2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5137-5146.
[60]DU W,WANG Y,YU Q.RPAN:An End-to-End RecurrentPose-Attention Network for Action Recognition in Videos[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:3725-3734.
[61]LUDWIG K,SCHERER S,EINFALT M,et al.Self-Supervised Learning for Human Pose Estimation in Sports [C]//2021 IEEE International Conference on Multimedia & Expo Workshops(ICMEW).IEEE,2021:1-6.
[62]LI M,CHEN S,CHEN X,et al.Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3595-3603.
[63]SHI L,ZHANG Y,CHENG J,et al.Skeleton-Based Action Re-cognition with Multi-Stream Adaptive Graph Convolutional Networks [J].IEEE Transactions on Image Processing,2020,29:9532-9545.
[64]DONG J,CHEN Q,SHEN X,et al.Towards Unified Human Parsing and Pose Estimation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:843-850.
[65]LIANG X D,GONG K,SHEN X H,et al.Look into Person:Joint Body Parsing Pose Estimation Network and a New Benchmark [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41(4):871-885.
[66]JOHNSON S,EVERINGHAM M.Clustered pose and nonlinear appearance models for human pose estimation[C]//Proceedings of the British Machine Vision Conference.Wales,2010:1-11.
[67]ANDRILUKA M,PISHCHULIN L,GEHLER P,et al.2d human pose estimation:New benchmark and state of the art analysis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:3686-3693.
[68]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:Common Objects in Context[C]//European Conference on Computer Vision.Cham:Springer,2014:740-755.
[69]LI J,WANG C,ZHU H,et al.CrowdPose:Efficient CrowdedScenes Pose Estimation and A New Benchmark [C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:10863-10872.
[70]ZHANG W,ZHU M,DERPANIS K G.From Actemes to Action:A Strongly-Supervised Representation for Detailed Action Understanding[C]//Proceedings of the IEEE International Conference on Computer Vision.2013:2248-2255.
[71]JHUANG H,GALL J,ZUFFI S,et al.Towards understanding action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.2013:3192-3199.
[72]ANDRILUKA M,IQBAL U,MILAN A,et al.PoseTrack:ABenchmark for Human Pose Estimation and Tracking [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5167-5176.
[73]KITAMURA T,TESHIMA H,THOMAS D,et al.RefiningOpenPose with a new sports dataset for robust 2D pose estimation[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2022:672-681.
[74]WANG Y,LI M,CAI H,et al.Lite pose:Efficient architecture design for 2d human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:13126-13136.
[75]COTTON R J.Posepipe:Open-source human pose estimationpipeline for clinical research[J].arXiv:2203.08792,2022.
[76]GUPTA D,ARTACHO B,SAVAKIS A.HandyPose:Multi-le-vel framework for hand pose estimation[J].Pattern Recognition,2022,128:108674.
[77]AN S,ZHANG X,WEI D,et al.FastHand:Fast monocular hand pose estimation on embedded systems[J].Journal of Systems Architecture,2022,122:102361.
[78]ZHANG M,ZHOU Z,DENG M.Cascaded hierarchical CNN for 2D hand pose estimation from a single color image[J].Multimedia Tools and Applications,2022,81(18):25745-25763.
[79]LIANG S,CHU G,XIE C,et al.Joint relation based human pose estimation[J].The Visual Computer,2022,38(4):1369-1381.
[1] TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[2] XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[3] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[4] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[5] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[6] WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[7] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[8] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[9] CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126.
[10] HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[11] ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169.
[12] SU Dan-ning, CAO Gui-tao, WANG Yan-nan, WANG Hong, REN He. Survey of Deep Learning for Radar Emitter Identification Based on Small Sample [J]. Computer Science, 2022, 49(7): 226-235.
[13] WU Zi-bin, YAN Qiao. Projected Gradient Descent Algorithm with Momentum [J]. Computer Science, 2022, 49(6A): 178-183.
[14] XU Guo-ning, CHEN Yi-peng, CHEN Yi-ming, CHEN Jin-yin, WEN Hao. Data Debiasing Method Based on Constrained Optimized Generative Adversarial Networks [J]. Computer Science, 2022, 49(6A): 184-190.
[15] ZHOU Zhi-hao, CHEN Lei, WU Xiang, QIU Dong-liang, LIANG Guang-sheng, ZENG Fan-qiao. SMOTE-SDSAE-SVM Based Vehicle CAN Bus Intrusion Detection Algorithm [J]. Computer Science, 2022, 49(6A): 562-570.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!