计算机科学 ›› 2022, Vol. 49 ›› Issue (11A): 220100057-5.doi: 10.11896/jsjkx.220100057

• 图像处理&多媒体技术 • 上一篇    下一篇

基于多尺度双注意力的人体姿态估计方法研究

马皖宜, 张德平   

  1. 南京航空航天大学计算机科学与技术学院 南京 211000
  • 出版日期:2022-11-10 发布日期:2022-11-21
  • 通讯作者: 张德平(depingzhang@163.com)
  • 作者简介:(wanyi_ma@163.com)
  • 基金资助:
    国防基础科研重点项目(JCKY2020605C003)

Study on Human Pose Estimation Based on Multiscale Dual Attention

MA Wan-yi, ZHANG De-ping   

  1. School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211000,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:MA Wan-yi,born in 1996,postgra-duate,is a member of China Computer Federation.Her main research interests include image processing and artificial intelligence modeling.
    ZHANG De-ping,born in 1973,Ph.D,postgraduate supervisor,is a member of China Computer Federation.His main research interests include image processing and artificial intelligence mode-ling.
  • Supported by:
    National Defense Basic Scientific Research Key Program(JCKY2020605C003).

摘要: 针对人体姿态估计中人体与背景区分度不高,基于HRNet网络的人体姿态估计中重要特征信息利用不完全的问题,利用通道与空间注意力机制,提出了一种基于多尺度双注意力(Multiscale Dual Attention,MDA)的人体姿态估计方法MDA-HRNet。该方法从通道域和空间域出发,分别设计了结合通道注意力的Ca-Neck,Ca-Block模块和结合空间注意力的Sa-Block模块,将其融入到高分辨率网络结构中,使网络能够重点关注图像中的人体区域。在Sa-Block模块中采用3×3和7×7的卷积核推导两种不同尺度的空间注意力映射,使网络区分人体特征和背景特征的能力更加显著,从而对人体及其关键点进行准确定位。该方法在MPII数据集上进行了实验验证,结果表明MDA-HRNet能有效地提高人体姿态估计关节点定位的准确度。

关键词: 人体姿态估计, 通道注意力, 空间注意力, 多尺度注意力映射, 高分辨率网络

Abstract: In view of the problem of low discrimination between human body and background in human posture estimation,and incomplete utilization of important feature information in human posture estimation based on HRNet,a human posture estimation method MDA-HRNet based on multiscale dual attention is proposed by using channel and spatial attention mechanism.Conside-ring both of the channel domain and spatial domain,the Ca-Neck and Ca-Block modules combined with channel attention and Sa-Block module combined with spatial attention are designed respectively.Then integrating these modules into the high-resolution network structure,so that the network can pay more attention to the human body area in the image.Moreover,in the Sa-Block module,3×3 and 7×7 convolution kernels are adopted to derive two spatial attention maps of different scales,which makes the ability of the network to comprehensively distinguish human features and background features more remarkable,so as to accurately locate the human body and its key points.The proposed method is tested and verified on MPII data set,and the results show that MDA-HRNet can improve the accuracy of joint point location of human posture estimation effectively.

Key words: Human pose estimation, Channel attention, Spatial attention, Multiscale attention mapping, High resolution network

中图分类号: 

  • TP391.41
[1]ZHOU Y,LIU Z Q,ZENG F Z,et al.Survey on Two-dimensional Human Pose Estimation of Deep Learning[J].Journal of Frontiers of Computer Science and Technology,2021,15(4):641-657.
[2]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[3]NEWELL A,YANG K,JIA D.Stacked Hourglass Networks for Human Pose Estimation[C]//European Conference on Compu-ter Vision.Springer International Publishing,2016.
[4]SUN K,XIAO B,LIU D,et al.Deep High-Resolution Representation Learning for Human Pose Estimation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2019.
[5]GOODFELLOW I J,POUGET-ABADIE J,MIRZA M,et al.Generative Adversarial Networks[J].Advances in Neural Information Processing Systems,2014,3:2672-2680.
[6]HAO S,LEE D H,ZHAO D.Sequence to sequence learningwith attention mechanism for short-term passenger flow prediction in large-scale metro system[J].Transportation Research Part C:Emerging Technologies,2019,107:287-300.
[7]ZILLICH M,FRINTROP S,PIRRI F,et al.Workshop on attention models in robotics:visual systems for better HRI[C]//Proceedings of the 2014 ACM/IEEE International Conference on Human-robot Interaction.New York:ACM,2014:499-500.
[8]JIE H,LI S,GANG S.Squeeze-and-Excitation Networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2018.
[9]WOO S,PARK J,LEE J Y,et al.CBAM:Convolutional Block Attention Module[C]//European Conference on Computer Vision.2018.
[10]JADERBERG M,SIMONYAN K,ZISSERMAN A.Spatialtransformer networks[J].Advances in Neural Information Processing Systems,2015,28:2017-2025.
[11]ALMAHAIRI A,BALLAS N,COOIJMANS T,et al.Dynamic capacity networks[C]//International Conference on Machine Learning.PMLR,2016:2549-2558.
[12]SZEGEDY C,WEI L,JIA Y,et al.Going deeper with convolutions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2015.
[13]ANDRILUKA M,PISHCHULIN L,GEHLER P,et al.Human Pose Estimation:New Benchmark and State of the Art Analysis[C]//Computer Vision and Pattern Recognition(CVPR).IEEE,2014.
[14]CHEN Y,WANG Z,PENG Y,et al.Cascaded Pyramid Network for Multi-person Pose Estimation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2018.
[15]YANG W,LI S,OUYANG W,et al.Learning feature pyramids for human pose estimation[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:1281-1290.
[16]TANG W,YU P,WU Y.Deeply learned compositional models for human pose estimation[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:190-206.
[17]INSAFUTDINOV E,PISHCHULIN L,ANDRES B,et al.Deepercut:A deeper,stronger,and faster multi-person pose estimation model[C]//European Conference on Computer Vision.Cham:Springer,2016:34-50.
[18]XIAO B,WU H,WEI Y.Simple baselines for human pose estimation and tracking[C]//Proceedings of the European Confe-rence on Computer Vision(ECCV).2018:466-481.
[1] 杨玥, 冯涛, 梁虹, 杨扬.
融合交叉注意力机制的图像任意风格迁移
Image Arbitrary Style Transfer via Criss-cross Attention
计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236
[2] 沈超, 何希平.
基于纹理特征增强和轻量级网络的人脸防伪算法
Face Anti-spoofing Algorithm Based on Texture Feature Enhancement and Light Neural Network
计算机科学, 2022, 49(6A): 390-396. https://doi.org/10.11896/jsjkx.210600217
[3] 邵延华, 李文峰, 张晓强, 楚红雨, 饶云波, 陈璐.
基于时空图卷积和注意力模型的航拍暴力行为识别
Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model
计算机科学, 2022, 49(6): 254-261. https://doi.org/10.11896/jsjkx.210400272
[4] 张瑛, 聂仁灿, 马朝振, 余仕双.
基于子空间特征相互学习的MRI与PET/SPECT图像融合
MRI and PET/SPECT Image Fusion Based on Subspace Feature Mutual Learning
计算机科学, 2022, 49(11A): 211000171-6. https://doi.org/10.11896/jsjkx.211000171
[5] 何鹏浩, 余映, 徐超越.
基于动态金字塔和子空间注意力的图像超分辨率重建网络
Image Super-resolution Reconstruction Network Based on Dynamic Pyramid and Subspace Attention
计算机科学, 2022, 49(11A): 210900202-8. https://doi.org/10.11896/jsjkx.210900202
[6] 杨连平, 孙玉波, 张红良, 李封, 张祥德.
基于编解码残差的人体关键点匹配网络
Human Keypoint Matching Network Based on Encoding and Decoding Residuals
计算机科学, 2020, 47(6): 114-120. https://doi.org/10.11896/jsjkx.200300079
[7] 李天培, 陈黎.
基于双注意力编码-解码器架构的视网膜血管分割
Retinal Vessel Segmentation Based on Dual Attention and Encoder-decoder Structure
计算机科学, 2020, 47(5): 166-171. https://doi.org/10.11896/jsjkx.190400062
[8] 冯晓月, 宋杰.
二维人体姿态估计研究进展
Research Advance on 2D Human Pose Estimation
计算机科学, 2020, 47(11): 128-136. https://doi.org/10.11896/jsjkx.200700061
[9] 王浩,刘则芬,方宝富,陈金金.
基于约束树形图结构外观模型的人体姿态估计
Human Pose Estimation Based on Appearance Model for Constraint Tree Pictorial Structure
计算机科学, 2014, 41(3): 76-79.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!