计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 221000048-7.doi: 10.11896/jsjkx.221000048

• 图像处理&多媒体技术 • 上一篇    下一篇

基于自适应预测的2D人体姿态估计

郑泉石1, 金城1,2   

  1. 1 复旦大学计算机科学技术学院 上海 200438
    2 鹏城实验室 广东 深圳 518066
  • 发布日期:2023-11-09
  • 通讯作者: 金城(jc@fudan.edu.cn)
  • 作者简介:(qszheng20@fudan.edu.cn)
  • 基金资助:
    上海市科技创新行动计划(22dz1204900)

2D Human Pose Estimation Based on Adaptive Estimation

ZHENG Quanshi1, JIN Cheng1,2   

  1. 1 School of Computer Science,Fudan University,Shanghai 200438,China
    2 Peng Cheng Laboratory,Shenzhen,Guangdong 518066,China
  • Published:2023-11-09
  • About author:ZHENG Quanshi,born in 1994,postgraduate,is a member of China Computer Federation.His main research interests include human pose estimation and action recognition.
    JIN Cheng,born in 1978,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include computer vision and multimedia information retrieval.
  • Supported by:
    Shanghai Municipal Science and Technology Commission(22dz1204900).

摘要: 基于回归的2D人体姿态估计方法直接预测人体关键点的2D坐标,是主流的2D姿态估计方法之一。Transformer能有效建立人体部位间的关系,它的应用显著提升了回归方法的准确率。然而相关方法存在以下两个问题:1)在交叉注意力模块中,对于不同图像,固定的Query值难以准确关注到不同的关键点区域,导致注意力分散;2)直接学习关键点的标注位置,导致模型过拟合于训练集的标注,泛化性差。文中提出了一种基于自适应预测的姿态估计模型来解决以上问题。针对第一个问题,该模型自适应地预测Query的关注区域,并引导注意力集中于该区域。针对第二个问题,该模型自适应地预测关键点在所有位置上出现的可能性分布,通过软预测的方式,缓解模型对标注的过拟合。在MS-COCO数据集上的实验表明,该模型将基线方法的准确率提升了2.8%,并将相关方法的最高准确率提升了0.2%。

关键词: 2D人体姿态估计, 回归, 自适应, 关注区域, 可能性分布

Abstract: The regression-based 2D human pose estimation methods directly predict the coordinates of human keypoints.The transformer can effectively establish the relationship between human body parts,and its application significantly improves the accuracy of the regression-based methods.However,related methods have the following two problems:1)In the cross-attention module,for different images,the fixed query can not properly focus on different keypoint regions,which leads to distraction.2)They directly learn the labeled keypoint coordinates and overfit annotations.In this paper,a pose estimation model based on adaptive prediction is proposed to solve these two problems.For the first problem,the model adaptively predicts the region of attention of the query and directs the attention to that region.For the second problem,the model adaptively predicts the probability distribution of keypoint appearing in every position,and alleviates the model's overfitting to annotations by means of soft prediction.Experiments on the MS-COCO dataset show that the model improves the accuracy of the baseline method by 2.8% and improves the highest accuracy of related methods by 0.2%.

Key words: 2D human pose estimation, Regression-based, Adaptive, Region of attention, Probability distribution

中图分类号: 

  • TP391
[1]ANDRILUKA M,PISHCHULIN L,GEHLER P,et al.2d human pose estimation:New benchmark and state of the art analysis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:3686-3693.
[2]TOSHEV A,SZEGEDY C.Deeppose:Human pose estimationvia deep neural networks[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2014:1653-1660.
[3]PISHCHULIN L,ANDRILUKA M,GEHLER P,et al.Poselet conditioned pictorial structures[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2013:588-595.
[4]WANG Y,MORI G.Multiple tree models for occlusion and spatial constraints in human pose estimation[C]//Proceedings of the European Conference on Computer Vision.2008:710-724.
[5]SUN M,KOHLI P,SHOTTON J.Conditional regression forests for human pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2012:3394-3401.
[6]SUN K,XIAO B,LIU D,et al.Deep high-resolution representa-tion learning for human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern recognition.2019:5693-5703.
[7]HUANG J,ZHU Z,GUO F,et al.The devil is in the details:Delving into unbiased data processing for human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5700-5709.
[8]VASWANI A,SHAZEER N,PARMARN,et al.Attention is all you need[J].arXiv:1706.03762,2017.
[9]LI K,WANG S,ZHANG X,et al.Pose recognition with cascade transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:1944-1953.
[10]TIAN Z,CHEN H,SHEN C.Directpose:Direct end-to-endmulti-person pose estimation[J].arXiv:1911.07451,2019.
[11]SUN X,XIAO B,WEI F,et al.Integral human pose regression[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:529-545.
[12]LI J,BIAN S,ZENG A,et al.Human pose regression with residual log-likelihood estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:11025-11034.
[13]MENG D,CHEN X,FAN Z,et al.Conditional detr for fasttraining convergence[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:3651-3660.
[14]CARION N,MASSA F,SYNNAEVE G,et al.End-to-end object detection with transformers[C]//European Conference on Computer Vision.Cham:Springer,2020:213-229.
[15]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European Conference on Computer Vision.Cham:Springer,2014:740-755.
[16]LOSHCHILOV I,HUTTER F.Decoupled weight decay regularization[J].arXiv:1711.05101,2017.
[17]GLOROT X,BENGIO Y.Understanding the difficulty of trai-ning deep feedforward neural networks[C]//Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.JMLR Workshop and Conference Proceedings,2010:249-256.
[18]XIAO B,WU H,WEI Y.Simple baselines for human pose estimation and tracking[C]//Proceedings of the European Confe-rence on Computer Vision(ECCV).2018:466-481.
[19]ZHOU X,WANG D,KRÄHENBÜHL P.Objects as points[J].arXiv:1904.07850,2019.
[20]WEI F,SUN X,LI H,et al.Point-set anchors for object detection,instance segmentation and pose estimation[C]//European Conference on Computer Vision.Cham:Springer,2020:527-544.
[21]NIE X,FENG J,ZHANG J,et al.Single-stage multi-person posemachines[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:6951-6960.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!