Computer Science ›› 2020, Vol. 47 ›› Issue (6): 114-120.doi: 10.11896/jsjkx.200300079

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Human Keypoint Matching Network Based on Encoding and Decoding Residuals

YANG Lian-ping1, SUN Yu-bo1, ZHANG Hong-liang1, LI Feng2, ZHANG Xiang-de1   

  1. 1 College of Sciences,Northeastern University,Shenyang 110004,China
    2 School of Computer Science and Engineering,Northeastern University,Shenyang 110004,China
  • Received:2020-03-13 Online:2020-06-15 Published:2020-06-10
  • About author:YANG Lian-ping,born in 1979,Ph.D,associate professor,is a member of China Computer Federation.His research interests include applied mathematics and computer vision.
    ZHANG Xiang-de,born in 1963,Ph.D,professor.His main research interests include applied mathematics and artificial intelligence.
  • Supported by:
    This work was supported by the Fundamental Research Funds for the Central Universities (N160504007) and Joint Founds of the National Natural Science Foundation of China(U1811261).

Abstract: Human pose estimation,especially multi-person pose estimation,is gradually penetrating into various aspects,such as education and sports.High-precision and lightweight multi-person pose estimation is a current research hotspot.Generally,bottom-up multi-person pose estimation method has strong real-time performance,however,its accuracy is not high and the network structure is huge.For the key point association problem,this paper proposes a few parameters and efficient pose estimation matching network.This network improves the basic ResNet module in the encoding stage to obtain the layer structure.Using these structures to extract features can greatly reduce the model’s parameter amount.In the decoding stage,a specially designed deconvolution structure is used,and residual connections are added to the entire network,which greatly improves the accuracy of the network.The whole algorithm can correctly match the heat map of key points to everyone,and obtain the final human key point estimate.The proposed model is a portable and efficient human keypoint matching network,because its mAP value on the ground truth of the COCO dataset is as high as 89.7,and the parameters are only 8.01 M.Compared with the current best bottom-up multi-person pose estimation method,the proposed model improves accuracy mAP value by 0.5 and reduces to 1/10 of the original in terms of para-meters.The proposed model uses the COCO 2017 and COCO 2014 datasets to train and verify,andachieves high accuracy.It shows that the proposed model is suitable for the input of heat maps of key points of various human bodies,and can get good results.In addition,this paper designs a variety of ablation experiments for different layer structures of the network model.The lightest structural parameter is only 1.28 M,and the accuracy mAP value can reach 81.8.

Key words: COCO datasets, Heat map, Human pose estimation, mAP value, Matching Algorithm

CLC Number: 

  • TP391
[1]CHEN Y,WANG Z,PENG Y,et al.Cascaded pyramid network for multi-person pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018.
[2]SUN K,XIAO B,LIU D,et al.Deep high-resolution representation learning for human pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019.
[3]CAO Z,SIMON T,WEI S E,et al.Realtime multi-person 2d pose estimation using part affinity fields [C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017.
[4]KOCABAS M,KARAGOZ S,AKBAS E.Multiposenet:Fast multi-person pose estimation using pose residual network[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018.
[5]COCO Common Objects in Context[OL].http://cocodataset.org/.
[6]XU H Y,YAO N M,PENG X L,et al.Multi-pose face image frontalization method based on codec network[J].Science in China:Information Science,2019,49(4):86-99.
[7]HOWARD A G,ZHU M,CHEN B,et al.Mobilenets:Efficient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861.
[8]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016.
[9]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440.
[10]TANG Z,PENG X,GENG S,et al.CU-net:coupled U-nets [J].arXiv:1808.06521.
[11]ZEILER M D,FERGUS R.Visualizing and understanding convolutional networks[C]//European Conference on Computer Vision.Springer,Cham,2014:818-833.
[12]ZEILER M D,KRISHNAN D,TAYLOR G W,et al.Deconvolutional networks[C]//2010 IEEE Computer Society Confe-rence on Computer Vision and Pattern Recognition.IEEE,2010.
[13]XIAO B,WU H P,WEI Y C.Simple baselines for human pose estimation and tracking[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:466-481.
[14]SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016.
[1] SHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu. Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model [J]. Computer Science, 2022, 49(6): 254-261.
[2] YANG Zi-qi, CAI Ying, ZHANG Hao-chen, FAN Yan-fang. Computational Task Offloading Scheme Based on Load Balance for Cooperative VEC Servers [J]. Computer Science, 2021, 48(1): 81-88.
[3] ZHU Ling-ying, SANG Qing-bing, GU Ting-ting. No-reference Stereo Image Quality Assessment Based on Disparity Information [J]. Computer Science, 2020, 47(9): 150-156.
[4] FENG Xiao-yue, SONG Jie. Research Advance on 2D Human Pose Estimation [J]. Computer Science, 2020, 47(11): 128-136.
[5] ZHANG Chen, LI Zhi, ZHU Hong-song and SUN Li-min. MIMA:A Multi-identification Check-in Data Matching Algorithm Based on Spatial and Temporal Relations [J]. Computer Science, 2018, 45(1): 188-195.
[6] LI Xiang-yang, ZHAO Hang-sheng, ZHAO Xiao-long and ZHANG Yang. Distributed Channel Allocation Algorithm for Multi-services Systems [J]. Computer Science, 2016, 43(Z6): 272-275.
[7] LI Dong-hui,ZHANG Bin,FEI Xiao-fei and LIU Yang. Algorithm of Matching to XACML-Policy Based on Component of Multi-valued Attribute [J]. Computer Science, 2014, 41(6): 104-107.
[8] WANG Hao,LIU Ze-fen,FANG Bao-fu and CHEN Jin-jin. Human Pose Estimation Based on Appearance Model for Constraint Tree Pictorial Structure [J]. Computer Science, 2014, 41(3): 76-79.
[9] . Fine-grained Parallel Multi-pattern Matching for Backbone Network NIDS [J]. Computer Science, 2013, 40(3): 74-76.
[10] . Research on Fast 3-D Rotation Calculation Based on Video Techniques [J]. Computer Science, 2013, 40(2): 289-293.
[11] . Spatio-temporal Event Detection Using Pub/Sub Middleware [J]. Computer Science, 2012, 39(10): 99-103.
[12] WANG Zhuo,FEND Xiao-ning,LIU Ting-bao. Region Matching Algorithm Based on Historical Information Sorting in DDM [J]. Computer Science, 2011, 38(10): 202-204.
[13] . [J]. Computer Science, 2009, 36(3): 273-276.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!