深度学习方法在二维人体姿态估计的研究进展

doi:10.11896/jsjkx.210900041

摘要/Abstract

摘要： 人体姿态估计的任务是对图像或视频中的人体关键点进行定位和检测,其一直是计算机视觉领域的热点研究方向之一,也是计算机理解人类行为动作的关键一步。近年来,图像和视频中的二维人体姿态关键点预测在许多领域有着广泛的应用,二维人体姿态估计利用深度学习强大的图像特征提取能力,提升了其鲁棒性、准确性并缩短了处理时间,而且表现效果远超传统方法。根据二维人体姿态研究对象数量的不同,可将其分为单人以及多人姿态估计方法。针对单人姿态估计,根据提取到的关键点表示的不同,可采用基于直接预测人体坐标点的坐标回归方法,以及预测人体关键点高斯分布的基于热图的检测方法;针对多人姿态估计,可采用的方法分为解决多人到单人过程的自顶向下方法,以及直接处理多人关键点的自底向上方法。根据现有的人体姿态估计方法对其进行总结,说明网络结构的内部机制及执行过程,并对常用的数据集、评价指标进行分析,最后阐述当前面临的问题及未来发展趋势。

关键词: 二维人体姿态估计, 深度学习, 单人姿态估计, 多人姿态估计, 评价指标

Abstract: The task of human pose estimation is to locate and detect the key points of human body in images or videos.It has always been one of the hot research directions in the field of computer vision,and it is also a key step for computers to understand human actions.In recent years,it has wide application for predicting the poses of two-dimensional human body key points in images and videos.Using the powerful image feature extraction capabilities of deep learning,two-dimensional human pose estimation has been improved in robustness,accuracy,and processing time,and the performance effect is far beyond traditional methods.According to the different number of objects in the two-dimensional human body pose,it can be divided into single-person and multi-person pose estimation methods.For single-person pose estimation,according to the different representations of the extracted key points,coordinate regression methods based on the direct prediction of human coordinate points and heat map detection methods based on predicting the Gaussian distribution of human key points can be used.In multi-person pose estimation,it is divided into the top-down method which solves the process from multiple people to a single person,and a bottom-up method that directly deals with the key points of multiple people.Based on the existing estimation methods of human body posture,this paper analyzes the internal mechanism of the network structure,analyzes the commonly used datasets and evaluation indicators,and elaborates the current problems and future development trends.

Key words: Two-dimensional human pose estimation, Deep learning, Single-person pose estimation, Multi-person pose estimation, Evaluation metrics

中图分类号:

TP391

张国平, 马楠, 贯怀光, 吴祉璇. 深度学习方法在二维人体姿态估计的研究进展[J]. 计算机科学, 2022, 49(12): 219-228. https://doi.org/10.11896/jsjkx.210900041

ZHANG Guo-ping, MA Nan, Guan Huai-guang, WU Zhi-xuan. Research Progress of Deep Learning Methods in Two-dimensional Human Pose Estimation[J]. Computer Science, 2022, 49(12): 219-228. https://doi.org/10.11896/jsjkx.210900041

参考文献

[1]CHEN L,MA N,PANG G L,et al.Research on multi-view data fusion and balanced YOLOv3 for pedestrian detection[J].CAAI Transactions on Intelligent Systems,2021,16(1):57-65.
[2]TAN M,LE Q V.EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks[C]//Proceedings of the 36th International Conference on Machine Learning.PMLR 97,2019:6105-6114.
[3]TOSHEV A,SZEGEDY C.DeepPose:Human Pose Estimation via Deep Neural Networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:1653-1660.
[4]CARREIRA J,AGRAWAL P,FRAGKIADAKI K,et al.Hu-man Pose Estimation with Iterative Error Feedback[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4733-4742.
[5]SUN X,SHANG J X,LIANG S,et al.Compositional Human Pose Regression[J].arXiv:1704.00159,2017.
[6]LUVIZON D C,TABIA H,PICARD D.Human Pose Regression by Combining Indirect Part Detection and Contextual Information [J].Computers & Graphics,2019,85:15-22.
[7]MAO W,GE Y,SHEN C,et al.TFPose:Direct Human Pose Estimation with Transformers[J].arXiv:2103.15320,2021.
[8]ZHANG H,OUYANG H,LIU S,et al.Human Pose Estimation with Spatial Contextual Information[J].arXiv:1901.01760,2019.
[9]ARTACHO B,SAVAKIS A.UniPose:Unified Human Pose Estimation in Single Images and Videos[C]//IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition(CVPR).IEEE,2020:7035-7044.
[10]LIFSHITZ I,FETAYA E,ULLMAN S.Human Pose Estimation using Deep Consensus Voting[C]//European Conference on Computer Vision.Cham:Springer,2016:246-260.
[11]WEI S E,RAMAKRISHNA V,KANADE T,et al.Convolutional Pose Machines[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4724-4732.
[12]NEWELL A,YANG K,JIA D.Stacked Hourglass Networks for Human Pose Estimation[C]//European Conference on Compu-ter Vision.Cham:Springer International Publishing,2016:483-499.
[13]YANG W,LI S,OUYANG W,et al.Learning Feature Pyramids for Human Pose Estimation[C]//IEEE Computer Society.IEEE Computer Society,2017:1281-1290.
[14]CHU X,YANG W,OUYANG W,et al.Multi-Context Atten-tion for Human Pose Estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1831-1840.
[15]WANG J,JIN S,LIU W,et al.When Human Pose Estimation Meets Robustness:Adversarial Algorithms and Benchmarks [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:11855-11864.
[16]GROOS D,RAMAMPIARO H,IHLEN E.EfficientPose:Scalable single-person pose estimation[J].Applied Intelligence,2021,51(4):2518-2533.
[17]WANG J,LONG X,GAO Y,et al.Graph-PCNN:Two Stage Human Pose Estimation with Graph Pose Refinement [C]//European Conference on Computer Vision.Cham:Springer,2020:492-508.
[18]HUANG J,ZHU Z,GUO F,et al.The Devil Is in the Details:Delving Into Unbiased Data Processing for Human Pose Estimation[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2020.
[19]CAI Y,WANG Z,LUO Z,et al.Learning Delicate Local Representations for Multi-Person Pose Estimation [C]//European Conference on Computer Vision.Cham:Springer,2020:455-472.
[20]ZHANG F,ZHU X,DAI H,et al.Distribution-Aware Coordinate Representation for Human Pose Estimation [C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:7093-7102.
[21]IQBAL U,GALL J.Multi-Person Pose Estimation with Local Joint-to-Person Associations[C]//European Conference on Computer Vision(ECCV) Workshops,Crowd Understanding,2016.Cham:Springer International Publishing,2016:627-642.
[22]FANG H S,XIE S,TAI Y W,et al.RMPE:Regional Multi-person Pose Estimation[C]//2017 IEEE International Conference on Computer Vision(ICCV).IEEE,2017:2334-2343.
[23]PAPANDREOU G,ZHU T,KANAZAWA N,et al.Towards Accurate Multi-person Pose Estimation in the Wild [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4903-4911.
[24]HUANG S,GONG M,TAO D.A Coarse-Fine Network forKeypoint Localization[C]//2017 IEEE International Confe-rence on Computer Vision(ICCV).IEEE,2017:3028-3037.
[25]KUMAR C,RAMESH J,CHAKRABORTY B,et al.VRUPose-SSD:Multiperson Pose Estimation For Automated Driving[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:15331-15338.
[26]CHEN Y,WANG Z,PENG Y,et al.Cascaded Pyramid Network for Multi-person Pose Estimation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2018:7103-7112.
[27]SU K,YU D,XU Z,et al.Multi-Person Pose Estimation with Enhanced Channel-wise and Spatial Information[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2019:5674-5682.
[28]QIU L,ZHANG X,LI Y,et al.Peeking into occluded joints:A novel framework for crowd pose estimation [C]//European Conference on Computer Vision.Cham:Springer,2020:488-504.
[29]SUN K,XIAO B,LIU D,et al.Deep High-Resolution Representation Learning for Human Pose Estimation [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:5693-5703.
[30]WANG X,TONG J,WANG R.Attention Refined Network for Human Pose Estimation[J].Neural Processing Letters,2021(4):1-20.
[31]IRDHAR R,GKIOXARI G,TORRESANI L,et al.Detect-and-Track:Efficient Pose Estimation in Videos[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2018:350-359.
[32]WANG M,TIGHE J,MODOLO D.Combining detection andtracking for human pose estimation in videos[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2020:11088-11096.
[33]XIAO B,WU H,WEI Y.Simple Baselines for Human Pose Estimation and Tracking [C]//Proceedings of the European Confe-rence on Computer Vision(ECCV).2018:466-481.
[34]BAO Q,LIU W,CHENG Y,et al.Pose-Guided Tracking-by-Detection:Robust Multi-Person Pose Tracking [J].IEEE Transactions on Multimedia,2020,23:161-175.
[35]RUAN W,LIU W,BAO Q,et al.POINet:Pose-Guided Ovonic Insight Network for Multi-Person Pose Tracking [C]//Procee-dings of the 27th ACM International Conference on Multimedia.2019:284-292.
[36]UMER R,DOERING A,LEIBE B,et al.Self-supervised Key-point Correspondences for Multi-Person Pose Estimation and Tracking in Videos [J].arXiv:2004.12652,2020.
[37]SNOWER M,KADAV A,LAI F,et al.15 Keypoints Is All You Need[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:6738-6748.
[38]YANG L P,SUN Y B,ZHANG H L,et al.Human KeypointMatching Network Based on Encoding and Decoding Residuals[J].Computer Science,2020,47(6):114-120.
[39]JIN S,LIU W,XIE E,et al.Differentiable Hierarchical GraphGrouping for Multi-Person Pose Estimation [C]//European Conference on Computer Vision.Cham:Springer,2020:718-734.
[40]CHENG B,XIAO B,WANG J,et al.HigherHRNet:Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5386-5395.
[41]CAO Z,SIMON T,WEI S E,et al.Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7291-7299.
[42]YU D,SU K,SUN J,et al.Multi-person Pose Estimation for Pose Tracking with Enhanced Cascaded Pyramid Network[C]//European Conference on Computer Vision.Cham:Springer,2018:221-226.
[43]KREISS S,BERTONI L,ALAHI A.PifPaf:Composite Fields for Human Pose Estimation [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:11977-11986.
[44]NEWELL A,HUANG Z,DENG J.Associative Embedding:End-to-End Learning for Joint Detection and Grouping[J].ar-Xiv:1611.05424,2016.
[45]PAPANDREOU G,ZHU T,CHEN L C,et al.PersonLab:Person Pose Estimation and Instance Segmentation with a Bottom-Up,Part-Based,Geometric Embedding Model [C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:269-286.
[46]KOCABAS M,KARAGOZ S,AKBAS E.MultiPoseNet:FastMulti-Person Pose Estimation using Pose Residual Network [C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:417-433.
[47]INSAFUTDINOV E,ANDRILUKA M,PISHCHULIN L,et al.ArtTrack:Articulated Multi-Person Tracking in the Wild [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6457-6465.
[48]XIU Y,LI J,WANG H,et al.Pose Flow:Efficient Online Pose Tracking[J].arXiv:1802.00977,2018.
[49]ZHANG Z,WANG C,QIN W.Semantically Synchronizing Multiple-Camera Systems with Human Pose Estimation[J].Sensors,2021,21(7):2464.
[50]FABBRI M,LANZI F,CALDERARA S,et al.Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World [C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:430-446.
[51]HWANG J,LEE J,PARK S,et al.Pose estimator and trackerusing temporal flow maps for limbs [C]//2019 International Joint Conference on Neural Networks(IJCNN).IEEE,2019:1-8.
[52]RAAJ Y,IDREES H,HIDALGO G,et al.Efficient OnlineMulti-Person 2D Pose Tracking With Recurrent Spatio-Temporal Affinity Fields[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2019:4620-4628.
[53]JIN S,LIU W,OUYANG W,et al.Multi-Person Articulated Tracking With Spatial and Temporal Embeddings [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:5664-5673.
[54]HELMSTETTER S,SNGER J,GERMANN R,et al.How touse human pose estimation to measure the hand-arm motion in craft application with no influence on the natural user behavior[J].Procedia CIRP,2021,100:631-636.
[55]STENUM J,ROSSI C,ROEMMICH R T.Two-dimensional vi-deo-based analysis of human gait using pose estimation[J].PLoS Computational Biology,2021,17(4):e1008935.
[56]FANG H S,CAO J,TAI Y W,et al.Pairwise Body-Part Attention for Recognizing Human-Object Interactions [C]//Procee-dings of the European Conference on Computer Vision(ECCV).2018:51-67.
[57]LI Y L,LIU X,WU X,et al.Transferable InteractivenessKnowledge for Human-Object Interaction Detection [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3585-3594.
[58]WAN B,ZHOU D,LIU Y,et al.Pose-aware Multi-level Feature Network for Human Object Interaction Detection [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:9469-9478.
[59]LUVIZON D C,PICARD D,TABIA H.2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5137-5146.
[60]DU W,WANG Y,YU Q.RPAN:An End-to-End RecurrentPose-Attention Network for Action Recognition in Videos[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:3725-3734.
[61]LUDWIG K,SCHERER S,EINFALT M,et al.Self-Supervised Learning for Human Pose Estimation in Sports [C]//2021 IEEE International Conference on Multimedia & Expo Workshops(ICMEW).IEEE,2021:1-6.
[62]LI M,CHEN S,CHEN X,et al.Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3595-3603.
[63]SHI L,ZHANG Y,CHENG J,et al.Skeleton-Based Action Re-cognition with Multi-Stream Adaptive Graph Convolutional Networks [J].IEEE Transactions on Image Processing,2020,29:9532-9545.
[64]DONG J,CHEN Q,SHEN X,et al.Towards Unified Human Parsing and Pose Estimation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:843-850.
[65]LIANG X D,GONG K,SHEN X H,et al.Look into Person:Joint Body Parsing Pose Estimation Network and a New Benchmark [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41(4):871-885.
[66]JOHNSON S,EVERINGHAM M.Clustered pose and nonlinear appearance models for human pose estimation[C]//Proceedings of the British Machine Vision Conference.Wales,2010:1-11.
[67]ANDRILUKA M,PISHCHULIN L,GEHLER P,et al.2d human pose estimation:New benchmark and state of the art analysis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:3686-3693.
[68]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:Common Objects in Context[C]//European Conference on Computer Vision.Cham:Springer,2014:740-755.
[69]LI J,WANG C,ZHU H,et al.CrowdPose:Efficient CrowdedScenes Pose Estimation and A New Benchmark [C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:10863-10872.
[70]ZHANG W,ZHU M,DERPANIS K G.From Actemes to Action:A Strongly-Supervised Representation for Detailed Action Understanding[C]//Proceedings of the IEEE International Conference on Computer Vision.2013:2248-2255.
[71]JHUANG H,GALL J,ZUFFI S,et al.Towards understanding action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.2013:3192-3199.
[72]ANDRILUKA M,IQBAL U,MILAN A,et al.PoseTrack:ABenchmark for Human Pose Estimation and Tracking [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5167-5176.
[73]KITAMURA T,TESHIMA H,THOMAS D,et al.RefiningOpenPose with a new sports dataset for robust 2D pose estimation[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2022:672-681.
[74]WANG Y,LI M,CAI H,et al.Lite pose:Efficient architecture design for 2d human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:13126-13136.
[75]COTTON R J.Posepipe:Open-source human pose estimationpipeline for clinical research[J].arXiv:2203.08792,2022.
[76]GUPTA D,ARTACHO B,SAVAKIS A.HandyPose:Multi-le-vel framework for hand pose estimation[J].Pattern Recognition,2022,128:108674.
[77]AN S,ZHANG X,WEI D,et al.FastHand:Fast monocular hand pose estimation on embedded systems[J].Journal of Systems Architecture,2022,122:102361.
[78]ZHANG M,ZHOU Z,DENG M.Cascaded hierarchical CNN for 2D hand pose estimation from a single color image[J].Multimedia Tools and Applications,2022,81(18):25745-25763.
[79]LIANG S,CHU G,XIE C,et al.Joint relation based human pose estimation[J].The Visual Computer,2022,38(4):1369-1381.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed