基于编解码残差的人体关键点匹配网络

doi:10.11896/jsjkx.200300079

计算机科学 ›› 2020, Vol. 47 ›› Issue (6): 114-120.doi: 10.11896/jsjkx.200300079

• 计算机图形学&多媒体 • 上一篇下一篇

基于编解码残差的人体关键点匹配网络

杨连平¹, 孙玉波¹, 张红良¹, 李封², 张祥德¹

1 东北大学理学院沈阳110004
2 东北大学计算机科学与工程学院沈阳110004

收稿日期:2020-03-13 出版日期:2020-06-15 发布日期:2020-06-10
通讯作者: 张祥德(zhangxiangde@mail.neu.edu.cn)
作者简介:yanglp@mail.neu.edu.cn
基金资助:
中央高校基本科研业务费专项资金(N160504007);国家自然科学基金联合基金项目(U1811261)

Human Keypoint Matching Network Based on Encoding and Decoding Residuals

YANG Lian-ping¹, SUN Yu-bo¹, ZHANG Hong-liang¹, LI Feng², ZHANG Xiang-de¹

1 College of Sciences,Northeastern University,Shenyang 110004,China
2 School of Computer Science and Engineering,Northeastern University,Shenyang 110004,China

Received:2020-03-13 Online:2020-06-15 Published:2020-06-10
About author:YANG Lian-ping,born in 1979,Ph.D,associate professor,is a member of China Computer Federation.His research interests include applied mathematics and computer vision.
ZHANG Xiang-de,born in 1963,Ph.D,professor.His main research interests include applied mathematics and artificial intelligence.
Supported by:
This work was supported by the Fundamental Research Funds for the Central Universities (N160504007) and Joint Founds of the National Natural Science Foundation of China(U1811261).

摘要/Abstract

摘要： 人体姿态估计尤其是多人姿态估计逐渐渗透到教育、体育等各个方面,精度高、轻量级的多人姿态估计更是当下的研究热点。自下而上的多人姿态估计方法的实时性较强,但是精度一般不高,网络结构也比较庞大。对于自下而上方法中最困难的一步——关键点关联问题,文中提出了一种轻量高效的姿态估计匹配网络。该网络在编码阶段将基础ResNet模块加以改进得到层结构,利用这些结构提取特征能够使得模型的参数量大幅减少;在解码阶段采用了特殊设计的反卷积结构,并在全网络添加了残差连接,这使得网络精度有很大的提升。整个算法能够将所有检测出来的关键点热图正确匹配到每一个人,得出最终的人体关键点估计。所提模型是一个轻便、高效的人体关键点匹配网络,它在COCO数据集地面真值上的mAP值高达89.7,而且参数只有8.01 M。这个结果相比目前最好的自下而上的人姿态估计方法在精度mAP值上提高了0.5,但是参数量仅为其1/10左右。所提网络利用COCO 2017和COCO 2014的地面真值分别进行了训练和验证,都取得了很高的精度,这证明其适合多种人体关键点热图的输入,并能够得到很好的效果。此外,文中针对网络模型的不同层结构设计了多种消融实验,最轻量的结构参数只有1.28兆,精度mAP值能够达到81.8。

关键词: COCO数据集, mAP值, 匹配算法, 热图, 人体姿态估计

Abstract: Human pose estimation,especially multi-person pose estimation,is gradually penetrating into various aspects,such as education and sports.High-precision and lightweight multi-person pose estimation is a current research hotspot.Generally,bottom-up multi-person pose estimation method has strong real-time performance,however,its accuracy is not high and the network structure is huge.For the key point association problem,this paper proposes a few parameters and efficient pose estimation matching network.This network improves the basic ResNet module in the encoding stage to obtain the layer structure.Using these structures to extract features can greatly reduce the model’s parameter amount.In the decoding stage,a specially designed deconvolution structure is used,and residual connections are added to the entire network,which greatly improves the accuracy of the network.The whole algorithm can correctly match the heat map of key points to everyone,and obtain the final human key point estimate.The proposed model is a portable and efficient human keypoint matching network,because its mAP value on the ground truth of the COCO dataset is as high as 89.7,and the parameters are only 8.01 M.Compared with the current best bottom-up multi-person pose estimation method,the proposed model improves accuracy mAP value by 0.5 and reduces to 1/10 of the original in terms of para-meters.The proposed model uses the COCO 2017 and COCO 2014 datasets to train and verify,andachieves high accuracy.It shows that the proposed model is suitable for the input of heat maps of key points of various human bodies,and can get good results.In addition,this paper designs a variety of ablation experiments for different layer structures of the network model.The lightest structural parameter is only 1.28 M,and the accuracy mAP value can reach 81.8.

Key words: COCO datasets, Heat map, Human pose estimation, mAP value, Matching Algorithm

中图分类号:

TP391

杨连平, 孙玉波, 张红良, 李封, 张祥德. 基于编解码残差的人体关键点匹配网络[J]. 计算机科学, 2020, 47(6): 114-120. https://doi.org/10.11896/jsjkx.200300079

YANG Lian-ping, SUN Yu-bo, ZHANG Hong-liang, LI Feng, ZHANG Xiang-de. Human Keypoint Matching Network Based on Encoding and Decoding Residuals[J]. Computer Science, 2020, 47(6): 114-120. https://doi.org/10.11896/jsjkx.200300079

参考文献

[1]CHEN Y,WANG Z,PENG Y,et al.Cascaded pyramid network for multi-person pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018.
[2]SUN K,XIAO B,LIU D,et al.Deep high-resolution representation learning for human pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019.
[3]CAO Z,SIMON T,WEI S E,et al.Realtime multi-person 2d pose estimation using part affinity fields [C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017.
[4]KOCABAS M,KARAGOZ S,AKBAS E.Multiposenet:Fast multi-person pose estimation using pose residual network[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018.
[5]COCO Common Objects in Context[OL].http://cocodataset.org/.
[6]XU H Y,YAO N M,PENG X L,et al.Multi-pose face image frontalization method based on codec network[J].Science in China:Information Science,2019,49(4):86-99.
[7]HOWARD A G,ZHU M,CHEN B,et al.Mobilenets:Efficient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861.
[8]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016.
[9]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440.
[10]TANG Z,PENG X,GENG S,et al.CU-net:coupled U-nets [J].arXiv:1808.06521.
[11]ZEILER M D,FERGUS R.Visualizing and understanding convolutional networks[C]//European Conference on Computer Vision.Springer,Cham,2014:818-833.
[12]ZEILER M D,KRISHNAN D,TAYLOR G W,et al.Deconvolutional networks[C]//2010 IEEE Computer Society Confe-rence on Computer Vision and Pattern Recognition.IEEE,2010.
[13]XIAO B,WU H P,WEI Y C.Simple baselines for human pose estimation and tracking[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:466-481.
[14]SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016.

相关文章 15

[1]	邵延华, 李文峰, 张晓强, 楚红雨, 饶云波, 陈璐. 基于时空图卷积和注意力模型的航拍暴力行为识别 Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model 计算机科学, 2022, 49(6): 254-261. https://doi.org/10.11896/jsjkx.210400272
[2]	许华杰, 张晨强, 苏国韶. 基于深层卷积残差网络的航拍图建筑物精确分割方法 Accurate Segmentation Method of Aerial Photography Buildings Based on Deep Convolutional Residual Network 计算机科学, 2021, 48(8): 169-174. https://doi.org/10.11896/jsjkx.200500096
[3]	杨紫淇, 蔡英, 张皓晨, 范艳芳. 基于负载均衡的VEC服务器联合计算任务卸载方案 Computational Task Offloading Scheme Based on Load Balance for Cooperative VEC Servers 计算机科学, 2021, 48(1): 81-88. https://doi.org/10.11896/jsjkx.200800220
[4]	朱玲莹, 桑庆兵, 顾婷婷. 基于视差信息的无参考立体图像质量评价 No-reference Stereo Image Quality Assessment Based on Disparity Information 计算机科学, 2020, 47(9): 150-156. https://doi.org/10.11896/jsjkx.190700213
[5]	冯晓月, 宋杰. 二维人体姿态估计研究进展 Research Advance on 2D Human Pose Estimation 计算机科学, 2020, 47(11): 128-136. https://doi.org/10.11896/jsjkx.200700061
[6]	张晨,李志,朱红松,孙利民. 基于时空相关性的多签到数据匹配算法 MIMA:A Multi-identification Check-in Data Matching Algorithm Based on Spatial and Temporal Relations 计算机科学, 2018, 45(1): 188-195. https://doi.org/10.11896/j.issn.1002-137X.2018.01.033
[7]	郑建彬,白雅贤,詹恩奇,汪阳. 基于改进SIFT匹配方法的货架乳制品识别 Improved SIFT Matching Method for Milk Beverage Recognition in Grocery 计算机科学, 2017, 44(9): 315-319. https://doi.org/10.11896/j.issn.1002-137X.2017.09.059
[8]	李湘洋,赵杭生,赵小龙,张阳. 分布式多业务系统信道分配算法 Distributed Channel Allocation Algorithm for Multi-services Systems 计算机科学, 2016, 43(Z6): 272-275. https://doi.org/10.11896/j.issn.1002-137X.2016.6A.065
[9]	李冬辉,张斌,费晓飞,刘洋. 基于多值属性分量的XACML策略匹配算法 Algorithm of Matching to XACML-Policy Based on Component of Multi-valued Attribute 计算机科学, 2014, 41(6): 104-107. https://doi.org/10.11896/j.issn.1002-137X.2014.06.021
[10]	王浩,刘则芬,方宝富,陈金金. 基于约束树形图结构外观模型的人体姿态估计 Human Pose Estimation Based on Appearance Model for Constraint Tree Pictorial Structure 计算机科学, 2014, 41(3): 76-79.
[11]	丁山,宋丽晓. 一种改进的视网膜图像中微小动脉瘤的检测算法 Improved Method of Microaneurysm Detection Algorithm Based on Digital Fundus Images 计算机科学, 2014, 41(12): 269-274. https://doi.org/10.11896/j.issn.1002-137X.2014.12.058
[12]	李璋，杜慧敏，张丽果. 基于分布式存储的正则表达式匹配算法设计与实现 Fine-grained Parallel Multi-pattern Matching for Backbone Network NIDS 计算机科学, 2013, 40(3): 74-76.
[13]	祝晓东，郁松年. 运用视频技术的快速三维旋转分析与计算的研究 Research on Fast 3-D Rotation Calculation Based on Video Techniques 计算机科学, 2013, 40(2): 289-293.
[14]	禚伟，金蓓弘，陈海彪，张利峰. 基于发布/订阅中间件的时空事件检测研究 Spatio-temporal Event Detection Using Pub/Sub Middleware 计算机科学, 2012, 39(10): 99-103.
[15]	王卓,冯晓宁,刘廷宝. DDM中基于历史信息排序的区域匹配算法 Region Matching Algorithm Based on Historical Information Sorting in DDM 计算机科学, 2011, 38(10): 202-204.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于编解码残差的人体关键点匹配网络

Human Keypoint Matching Network Based on Encoding and Decoding Residuals

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0