计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 221000048-7.doi: 10.11896/jsjkx.221000048
郑泉石1, 金城1,2
ZHENG Quanshi1, JIN Cheng1,2
摘要: 基于回归的2D人体姿态估计方法直接预测人体关键点的2D坐标,是主流的2D姿态估计方法之一。Transformer能有效建立人体部位间的关系,它的应用显著提升了回归方法的准确率。然而相关方法存在以下两个问题:1)在交叉注意力模块中,对于不同图像,固定的Query值难以准确关注到不同的关键点区域,导致注意力分散;2)直接学习关键点的标注位置,导致模型过拟合于训练集的标注,泛化性差。文中提出了一种基于自适应预测的姿态估计模型来解决以上问题。针对第一个问题,该模型自适应地预测Query的关注区域,并引导注意力集中于该区域。针对第二个问题,该模型自适应地预测关键点在所有位置上出现的可能性分布,通过软预测的方式,缓解模型对标注的过拟合。在MS-COCO数据集上的实验表明,该模型将基线方法的准确率提升了2.8%,并将相关方法的最高准确率提升了0.2%。
中图分类号:
[1]ANDRILUKA M,PISHCHULIN L,GEHLER P,et al.2d human pose estimation:New benchmark and state of the art analysis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:3686-3693. [2]TOSHEV A,SZEGEDY C.Deeppose:Human pose estimationvia deep neural networks[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2014:1653-1660. [3]PISHCHULIN L,ANDRILUKA M,GEHLER P,et al.Poselet conditioned pictorial structures[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2013:588-595. [4]WANG Y,MORI G.Multiple tree models for occlusion and spatial constraints in human pose estimation[C]//Proceedings of the European Conference on Computer Vision.2008:710-724. [5]SUN M,KOHLI P,SHOTTON J.Conditional regression forests for human pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2012:3394-3401. [6]SUN K,XIAO B,LIU D,et al.Deep high-resolution representa-tion learning for human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern recognition.2019:5693-5703. [7]HUANG J,ZHU Z,GUO F,et al.The devil is in the details:Delving into unbiased data processing for human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5700-5709. [8]VASWANI A,SHAZEER N,PARMARN,et al.Attention is all you need[J].arXiv:1706.03762,2017. [9]LI K,WANG S,ZHANG X,et al.Pose recognition with cascade transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:1944-1953. [10]TIAN Z,CHEN H,SHEN C.Directpose:Direct end-to-endmulti-person pose estimation[J].arXiv:1911.07451,2019. [11]SUN X,XIAO B,WEI F,et al.Integral human pose regression[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:529-545. [12]LI J,BIAN S,ZENG A,et al.Human pose regression with residual log-likelihood estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:11025-11034. [13]MENG D,CHEN X,FAN Z,et al.Conditional detr for fasttraining convergence[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:3651-3660. [14]CARION N,MASSA F,SYNNAEVE G,et al.End-to-end object detection with transformers[C]//European Conference on Computer Vision.Cham:Springer,2020:213-229. [15]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European Conference on Computer Vision.Cham:Springer,2014:740-755. [16]LOSHCHILOV I,HUTTER F.Decoupled weight decay regularization[J].arXiv:1711.05101,2017. [17]GLOROT X,BENGIO Y.Understanding the difficulty of trai-ning deep feedforward neural networks[C]//Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.JMLR Workshop and Conference Proceedings,2010:249-256. [18]XIAO B,WU H,WEI Y.Simple baselines for human pose estimation and tracking[C]//Proceedings of the European Confe-rence on Computer Vision(ECCV).2018:466-481. [19]ZHOU X,WANG D,KRÄHENBÜHL P.Objects as points[J].arXiv:1904.07850,2019. [20]WEI F,SUN X,LI H,et al.Point-set anchors for object detection,instance segmentation and pose estimation[C]//European Conference on Computer Vision.Cham:Springer,2020:527-544. [21]NIE X,FENG J,ZHANG J,et al.Single-stage multi-person posemachines[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:6951-6960. |
|