计算机科学 ›› 2022, Vol. 49 ›› Issue (12): 219-228.doi: 10.11896/jsjkx.210900041
张国平1,3, 马楠2, 贯怀光1, 吴祉璇1
ZHANG Guo-ping1,3, MA Nan2, Guan Huai-guang1, WU Zhi-xuan1
摘要: 人体姿态估计的任务是对图像或视频中的人体关键点进行定位和检测,其一直是计算机视觉领域的热点研究方向之一,也是计算机理解人类行为动作的关键一步。近年来,图像和视频中的二维人体姿态关键点预测在许多领域有着广泛的应用,二维人体姿态估计利用深度学习强大的图像特征提取能力,提升了其鲁棒性、准确性并缩短了处理时间,而且表现效果远超传统方法。根据二维人体姿态研究对象数量的不同,可将其分为单人以及多人姿态估计方法。针对单人姿态估计,根据提取到的关键点表示的不同,可采用基于直接预测人体坐标点的坐标回归方法,以及预测人体关键点高斯分布的基于热图的检测方法;针对多人姿态估计,可采用的方法分为解决多人到单人过程的自顶向下方法,以及直接处理多人关键点的自底向上方法。根据现有的人体姿态估计方法对其进行总结,说明网络结构的内部机制及执行过程,并对常用的数据集、评价指标进行分析,最后阐述当前面临的问题及未来发展趋势。
中图分类号:
[1]CHEN L,MA N,PANG G L,et al.Research on multi-view data fusion and balanced YOLOv3 for pedestrian detection[J].CAAI Transactions on Intelligent Systems,2021,16(1):57-65. [2]TAN M,LE Q V.EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks[C]//Proceedings of the 36th International Conference on Machine Learning.PMLR 97,2019:6105-6114. [3]TOSHEV A,SZEGEDY C.DeepPose:Human Pose Estimation via Deep Neural Networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:1653-1660. [4]CARREIRA J,AGRAWAL P,FRAGKIADAKI K,et al.Hu-man Pose Estimation with Iterative Error Feedback[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4733-4742. [5]SUN X,SHANG J X,LIANG S,et al.Compositional Human Pose Regression[J].arXiv:1704.00159,2017. [6]LUVIZON D C,TABIA H,PICARD D.Human Pose Regression by Combining Indirect Part Detection and Contextual Information [J].Computers & Graphics,2019,85:15-22. [7]MAO W,GE Y,SHEN C,et al.TFPose:Direct Human Pose Estimation with Transformers[J].arXiv:2103.15320,2021. [8]ZHANG H,OUYANG H,LIU S,et al.Human Pose Estimation with Spatial Contextual Information[J].arXiv:1901.01760,2019. [9]ARTACHO B,SAVAKIS A.UniPose:Unified Human Pose Estimation in Single Images and Videos[C]//IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition(CVPR).IEEE,2020:7035-7044. [10]LIFSHITZ I,FETAYA E,ULLMAN S.Human Pose Estimation using Deep Consensus Voting[C]//European Conference on Computer Vision.Cham:Springer,2016:246-260. [11]WEI S E,RAMAKRISHNA V,KANADE T,et al.Convolutional Pose Machines[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4724-4732. [12]NEWELL A,YANG K,JIA D.Stacked Hourglass Networks for Human Pose Estimation[C]//European Conference on Compu-ter Vision.Cham:Springer International Publishing,2016:483-499. [13]YANG W,LI S,OUYANG W,et al.Learning Feature Pyramids for Human Pose Estimation[C]//IEEE Computer Society.IEEE Computer Society,2017:1281-1290. [14]CHU X,YANG W,OUYANG W,et al.Multi-Context Atten-tion for Human Pose Estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1831-1840. [15]WANG J,JIN S,LIU W,et al.When Human Pose Estimation Meets Robustness:Adversarial Algorithms and Benchmarks [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:11855-11864. [16]GROOS D,RAMAMPIARO H,IHLEN E.EfficientPose:Scalable single-person pose estimation[J].Applied Intelligence,2021,51(4):2518-2533. [17]WANG J,LONG X,GAO Y,et al.Graph-PCNN:Two Stage Human Pose Estimation with Graph Pose Refinement [C]//European Conference on Computer Vision.Cham:Springer,2020:492-508. [18]HUANG J,ZHU Z,GUO F,et al.The Devil Is in the Details:Delving Into Unbiased Data Processing for Human Pose Estimation[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2020. [19]CAI Y,WANG Z,LUO Z,et al.Learning Delicate Local Representations for Multi-Person Pose Estimation [C]//European Conference on Computer Vision.Cham:Springer,2020:455-472. [20]ZHANG F,ZHU X,DAI H,et al.Distribution-Aware Coordinate Representation for Human Pose Estimation [C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:7093-7102. [21]IQBAL U,GALL J.Multi-Person Pose Estimation with Local Joint-to-Person Associations[C]//European Conference on Computer Vision(ECCV) Workshops,Crowd Understanding,2016.Cham:Springer International Publishing,2016:627-642. [22]FANG H S,XIE S,TAI Y W,et al.RMPE:Regional Multi-person Pose Estimation[C]//2017 IEEE International Conference on Computer Vision(ICCV).IEEE,2017:2334-2343. [23]PAPANDREOU G,ZHU T,KANAZAWA N,et al.Towards Accurate Multi-person Pose Estimation in the Wild [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4903-4911. [24]HUANG S,GONG M,TAO D.A Coarse-Fine Network forKeypoint Localization[C]//2017 IEEE International Confe-rence on Computer Vision(ICCV).IEEE,2017:3028-3037. [25]KUMAR C,RAMESH J,CHAKRABORTY B,et al.VRUPose-SSD:Multiperson Pose Estimation For Automated Driving[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:15331-15338. [26]CHEN Y,WANG Z,PENG Y,et al.Cascaded Pyramid Network for Multi-person Pose Estimation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2018:7103-7112. [27]SU K,YU D,XU Z,et al.Multi-Person Pose Estimation with Enhanced Channel-wise and Spatial Information[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2019:5674-5682. [28]QIU L,ZHANG X,LI Y,et al.Peeking into occluded joints:A novel framework for crowd pose estimation [C]//European Conference on Computer Vision.Cham:Springer,2020:488-504. [29]SUN K,XIAO B,LIU D,et al.Deep High-Resolution Representation Learning for Human Pose Estimation [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:5693-5703. [30]WANG X,TONG J,WANG R.Attention Refined Network for Human Pose Estimation[J].Neural Processing Letters,2021(4):1-20. [31]IRDHAR R,GKIOXARI G,TORRESANI L,et al.Detect-and-Track:Efficient Pose Estimation in Videos[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2018:350-359. [32]WANG M,TIGHE J,MODOLO D.Combining detection andtracking for human pose estimation in videos[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2020:11088-11096. [33]XIAO B,WU H,WEI Y.Simple Baselines for Human Pose Estimation and Tracking [C]//Proceedings of the European Confe-rence on Computer Vision(ECCV).2018:466-481. [34]BAO Q,LIU W,CHENG Y,et al.Pose-Guided Tracking-by-Detection:Robust Multi-Person Pose Tracking [J].IEEE Transactions on Multimedia,2020,23:161-175. [35]RUAN W,LIU W,BAO Q,et al.POINet:Pose-Guided Ovonic Insight Network for Multi-Person Pose Tracking [C]//Procee-dings of the 27th ACM International Conference on Multimedia.2019:284-292. [36]UMER R,DOERING A,LEIBE B,et al.Self-supervised Key-point Correspondences for Multi-Person Pose Estimation and Tracking in Videos [J].arXiv:2004.12652,2020. [37]SNOWER M,KADAV A,LAI F,et al.15 Keypoints Is All You Need[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:6738-6748. [38]YANG L P,SUN Y B,ZHANG H L,et al.Human KeypointMatching Network Based on Encoding and Decoding Residuals[J].Computer Science,2020,47(6):114-120. [39]JIN S,LIU W,XIE E,et al.Differentiable Hierarchical GraphGrouping for Multi-Person Pose Estimation [C]//European Conference on Computer Vision.Cham:Springer,2020:718-734. [40]CHENG B,XIAO B,WANG J,et al.HigherHRNet:Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5386-5395. [41]CAO Z,SIMON T,WEI S E,et al.Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7291-7299. [42]YU D,SU K,SUN J,et al.Multi-person Pose Estimation for Pose Tracking with Enhanced Cascaded Pyramid Network[C]//European Conference on Computer Vision.Cham:Springer,2018:221-226. [43]KREISS S,BERTONI L,ALAHI A.PifPaf:Composite Fields for Human Pose Estimation [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:11977-11986. [44]NEWELL A,HUANG Z,DENG J.Associative Embedding:End-to-End Learning for Joint Detection and Grouping[J].ar-Xiv:1611.05424,2016. [45]PAPANDREOU G,ZHU T,CHEN L C,et al.PersonLab:Person Pose Estimation and Instance Segmentation with a Bottom-Up,Part-Based,Geometric Embedding Model [C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:269-286. [46]KOCABAS M,KARAGOZ S,AKBAS E.MultiPoseNet:FastMulti-Person Pose Estimation using Pose Residual Network [C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:417-433. [47]INSAFUTDINOV E,ANDRILUKA M,PISHCHULIN L,et al.ArtTrack:Articulated Multi-Person Tracking in the Wild [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6457-6465. [48]XIU Y,LI J,WANG H,et al.Pose Flow:Efficient Online Pose Tracking[J].arXiv:1802.00977,2018. [49]ZHANG Z,WANG C,QIN W.Semantically Synchronizing Multiple-Camera Systems with Human Pose Estimation[J].Sensors,2021,21(7):2464. [50]FABBRI M,LANZI F,CALDERARA S,et al.Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World [C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:430-446. [51]HWANG J,LEE J,PARK S,et al.Pose estimator and trackerusing temporal flow maps for limbs [C]//2019 International Joint Conference on Neural Networks(IJCNN).IEEE,2019:1-8. [52]RAAJ Y,IDREES H,HIDALGO G,et al.Efficient OnlineMulti-Person 2D Pose Tracking With Recurrent Spatio-Temporal Affinity Fields[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2019:4620-4628. [53]JIN S,LIU W,OUYANG W,et al.Multi-Person Articulated Tracking With Spatial and Temporal Embeddings [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:5664-5673. [54]HELMSTETTER S,SNGER J,GERMANN R,et al.How touse human pose estimation to measure the hand-arm motion in craft application with no influence on the natural user behavior[J].Procedia CIRP,2021,100:631-636. [55]STENUM J,ROSSI C,ROEMMICH R T.Two-dimensional vi-deo-based analysis of human gait using pose estimation[J].PLoS Computational Biology,2021,17(4):e1008935. [56]FANG H S,CAO J,TAI Y W,et al.Pairwise Body-Part Attention for Recognizing Human-Object Interactions [C]//Procee-dings of the European Conference on Computer Vision(ECCV).2018:51-67. [57]LI Y L,LIU X,WU X,et al.Transferable InteractivenessKnowledge for Human-Object Interaction Detection [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3585-3594. [58]WAN B,ZHOU D,LIU Y,et al.Pose-aware Multi-level Feature Network for Human Object Interaction Detection [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:9469-9478. [59]LUVIZON D C,PICARD D,TABIA H.2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5137-5146. [60]DU W,WANG Y,YU Q.RPAN:An End-to-End RecurrentPose-Attention Network for Action Recognition in Videos[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:3725-3734. [61]LUDWIG K,SCHERER S,EINFALT M,et al.Self-Supervised Learning for Human Pose Estimation in Sports [C]//2021 IEEE International Conference on Multimedia & Expo Workshops(ICMEW).IEEE,2021:1-6. [62]LI M,CHEN S,CHEN X,et al.Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3595-3603. [63]SHI L,ZHANG Y,CHENG J,et al.Skeleton-Based Action Re-cognition with Multi-Stream Adaptive Graph Convolutional Networks [J].IEEE Transactions on Image Processing,2020,29:9532-9545. [64]DONG J,CHEN Q,SHEN X,et al.Towards Unified Human Parsing and Pose Estimation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:843-850. [65]LIANG X D,GONG K,SHEN X H,et al.Look into Person:Joint Body Parsing Pose Estimation Network and a New Benchmark [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41(4):871-885. [66]JOHNSON S,EVERINGHAM M.Clustered pose and nonlinear appearance models for human pose estimation[C]//Proceedings of the British Machine Vision Conference.Wales,2010:1-11. [67]ANDRILUKA M,PISHCHULIN L,GEHLER P,et al.2d human pose estimation:New benchmark and state of the art analysis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:3686-3693. [68]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:Common Objects in Context[C]//European Conference on Computer Vision.Cham:Springer,2014:740-755. [69]LI J,WANG C,ZHU H,et al.CrowdPose:Efficient CrowdedScenes Pose Estimation and A New Benchmark [C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:10863-10872. [70]ZHANG W,ZHU M,DERPANIS K G.From Actemes to Action:A Strongly-Supervised Representation for Detailed Action Understanding[C]//Proceedings of the IEEE International Conference on Computer Vision.2013:2248-2255. [71]JHUANG H,GALL J,ZUFFI S,et al.Towards understanding action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.2013:3192-3199. [72]ANDRILUKA M,IQBAL U,MILAN A,et al.PoseTrack:ABenchmark for Human Pose Estimation and Tracking [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5167-5176. [73]KITAMURA T,TESHIMA H,THOMAS D,et al.RefiningOpenPose with a new sports dataset for robust 2D pose estimation[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2022:672-681. [74]WANG Y,LI M,CAI H,et al.Lite pose:Efficient architecture design for 2d human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:13126-13136. [75]COTTON R J.Posepipe:Open-source human pose estimationpipeline for clinical research[J].arXiv:2203.08792,2022. [76]GUPTA D,ARTACHO B,SAVAKIS A.HandyPose:Multi-le-vel framework for hand pose estimation[J].Pattern Recognition,2022,128:108674. [77]AN S,ZHANG X,WEI D,et al.FastHand:Fast monocular hand pose estimation on embedded systems[J].Journal of Systems Architecture,2022,122:102361. [78]ZHANG M,ZHOU Z,DENG M.Cascaded hierarchical CNN for 2D hand pose estimation from a single color image[J].Multimedia Tools and Applications,2022,81(18):25745-25763. [79]LIANG S,CHU G,XIE C,et al.Joint relation based human pose estimation[J].The Visual Computer,2022,38(4):1369-1381. |
[1] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[2] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[3] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[4] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[5] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[6] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[7] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[8] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[9] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[10] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[11] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[12] | 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138 |
[13] | 刘伟业, 鲁慧民, 李玉鹏, 马宁. 指静脉识别技术研究综述 Survey on Finger Vein Recognition Research 计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056 |
[14] | 孙福权, 崔志清, 邹彭, 张琨. 基于多尺度特征的脑肿瘤分割算法 Brain Tumor Segmentation Algorithm Based on Multi-scale Features 计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217 |
[15] | 康雁, 徐玉龙, 寇勇奇, 谢思宇, 杨学昆, 李浩. 基于Transformer和LSTM的药物相互作用预测 Drug-Drug Interaction Prediction Based on Transformer and LSTM 计算机科学, 2022, 49(6A): 17-21. https://doi.org/10.11896/jsjkx.210400150 |
|