基于多尺度双注意力的人体姿态估计方法研究

doi:10.11896/jsjkx.220100057

Computer Science ›› 2022, Vol. 49 ›› Issue (11A): 220100057-5.doi: 10.11896/jsjkx.220100057

• Image Processing & Multimedia Technology • Previous Articles Next Articles

Study on Human Pose Estimation Based on Multiscale Dual Attention

MA Wan-yi, ZHANG De-ping

School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211000,China

Online:2022-11-10 Published:2022-11-21
About author:MA Wan-yi,born in 1996,postgra-duate,is a member of China Computer Federation.Her main research interests include image processing and artificial intelligence modeling.
ZHANG De-ping,born in 1973,Ph.D,postgraduate supervisor,is a member of China Computer Federation.His main research interests include image processing and artificial intelligence mode-ling.
Supported by:
National Defense Basic Scientific Research Key Program(JCKY2020605C003).

Abstract

Abstract: In view of the problem of low discrimination between human body and background in human posture estimation,and incomplete utilization of important feature information in human posture estimation based on HRNet,a human posture estimation method MDA-HRNet based on multiscale dual attention is proposed by using channel and spatial attention mechanism.Conside-ring both of the channel domain and spatial domain,the Ca-Neck and Ca-Block modules combined with channel attention and Sa-Block module combined with spatial attention are designed respectively.Then integrating these modules into the high-resolution network structure,so that the network can pay more attention to the human body area in the image.Moreover,in the Sa-Block module,3×3 and 7×7 convolution kernels are adopted to derive two spatial attention maps of different scales,which makes the ability of the network to comprehensively distinguish human features and background features more remarkable,so as to accurately locate the human body and its key points.The proposed method is tested and verified on MPII data set,and the results show that MDA-HRNet can improve the accuracy of joint point location of human posture estimation effectively.

Key words: Human pose estimation, Channel attention, Spatial attention, Multiscale attention mapping, High resolution network

CLC Number:

TP391.41

MA Wan-yi, ZHANG De-ping. Study on Human Pose Estimation Based on Multiscale Dual Attention[J].Computer Science, 2022, 49(11A): 220100057-5.

References

[1]ZHOU Y,LIU Z Q,ZENG F Z,et al.Survey on Two-dimensional Human Pose Estimation of Deep Learning[J].Journal of Frontiers of Computer Science and Technology,2021,15(4):641-657.
[2]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[3]NEWELL A,YANG K,JIA D.Stacked Hourglass Networks for Human Pose Estimation[C]//European Conference on Compu-ter Vision.Springer International Publishing,2016.
[4]SUN K,XIAO B,LIU D,et al.Deep High-Resolution Representation Learning for Human Pose Estimation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2019.
[5]GOODFELLOW I J,POUGET-ABADIE J,MIRZA M,et al.Generative Adversarial Networks[J].Advances in Neural Information Processing Systems,2014,3:2672-2680.
[6]HAO S,LEE D H,ZHAO D.Sequence to sequence learningwith attention mechanism for short-term passenger flow prediction in large-scale metro system[J].Transportation Research Part C:Emerging Technologies,2019,107:287-300.
[7]ZILLICH M,FRINTROP S,PIRRI F,et al.Workshop on attention models in robotics:visual systems for better HRI[C]//Proceedings of the 2014 ACM/IEEE International Conference on Human-robot Interaction.New York:ACM,2014:499-500.
[8]JIE H,LI S,GANG S.Squeeze-and-Excitation Networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2018.
[9]WOO S,PARK J,LEE J Y,et al.CBAM:Convolutional Block Attention Module[C]//European Conference on Computer Vision.2018.
[10]JADERBERG M,SIMONYAN K,ZISSERMAN A.Spatialtransformer networks[J].Advances in Neural Information Processing Systems,2015,28:2017-2025.
[11]ALMAHAIRI A,BALLAS N,COOIJMANS T,et al.Dynamic capacity networks[C]//International Conference on Machine Learning.PMLR,2016:2549-2558.
[12]SZEGEDY C,WEI L,JIA Y,et al.Going deeper with convolutions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2015.
[13]ANDRILUKA M,PISHCHULIN L,GEHLER P,et al.Human Pose Estimation:New Benchmark and State of the Art Analysis[C]//Computer Vision and Pattern Recognition(CVPR).IEEE,2014.
[14]CHEN Y,WANG Z,PENG Y,et al.Cascaded Pyramid Network for Multi-person Pose Estimation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2018.
[15]YANG W,LI S,OUYANG W,et al.Learning feature pyramids for human pose estimation[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:1281-1290.
[16]TANG W,YU P,WU Y.Deeply learned compositional models for human pose estimation[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:190-206.
[17]INSAFUTDINOV E,PISHCHULIN L,ANDRES B,et al.Deepercut:A deeper,stronger,and faster multi-person pose estimation model[C]//European Conference on Computer Vision.Cham:Springer,2016:34-50.
[18]XIAO B,WU H,WEI Y.Simple baselines for human pose estimation and tracking[C]//Proceedings of the European Confe-rence on Computer Vision(ECCV).2018:466-481.

Related Articles 9

[1]	YANG Yue, FENG Tao, LIANG Hong, YANG Yang. Image Arbitrary Style Transfer via Criss-cross Attention [J]. Computer Science, 2022, 49(6A): 345-352.
[2]	SHEN Chao, HE Xi-ping. Face Anti-spoofing Algorithm Based on Texture Feature Enhancement and Light Neural Network [J]. Computer Science, 2022, 49(6A): 390-396.
[3]	SHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu. Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model [J]. Computer Science, 2022, 49(6): 254-261.
[4]	GUO Lin, LI Chen, CHEN Chen, ZHAO Rui, FAN Shi-lin, XU Xing-yu. Image Super-resolution Reconstruction Using Recursive ResidualNetwork Based on ChannelAttention [J]. Computer Science, 2021, 48(8): 139-144.
[5]	WANG Jian-ming, LI Xiang-feng, YE Lei, ZUO Dun-wen, ZHANG Li-ping. Medical Image Deblur Using Generative Adversarial Networks with Channel Attention [J]. Computer Science, 2021, 48(6A): 101-106.
[6]	YANG Lian-ping, SUN Yu-bo, ZHANG Hong-liang, LI Feng, ZHANG Xiang-de. Human Keypoint Matching Network Based on Encoding and Decoding Residuals [J]. Computer Science, 2020, 47(6): 114-120.
[7]	LI Tian-pei, CHEN Li. Retinal Vessel Segmentation Based on Dual Attention and Encoder-decoder Structure [J]. Computer Science, 2020, 47(5): 166-171.
[8]	FENG Xiao-yue, SONG Jie. Research Advance on 2D Human Pose Estimation [J]. Computer Science, 2020, 47(11): 128-136.
[9]	WANG Hao,LIU Ze-fen,FANG Bao-fu and CHEN Jin-jin. Human Pose Estimation Based on Appearance Model for Constraint Tree Pictorial Structure [J]. Computer Science, 2014, 41(3): 76-79.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Study on Human Pose Estimation Based on Multiscale Dual Attention

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 9

Metrics

Comments

Recommended 0