基于局部特征与全局表征耦合的2D人体姿态估计

doi:10.11896/jsjkx.221100007

计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 221100007-5.doi: 10.11896/jsjkx.221100007

• 图像处理&多媒体技术 • 上一篇下一篇

基于局部特征与全局表征耦合的2D人体姿态估计

陈乔松¹, 吴济良¹, 蒋波¹, 谭冲冲², 孙开伟¹, 邓欣¹, 王进¹

1 重庆邮电大学计算机科学与技术学院重庆 400065
2 重庆邮电大学自动化学院/工业互联网学院重庆 400065

发布日期:2023-11-09
通讯作者: 陈乔松(chenqs@cqupt.edu.cn)
基金资助:
国家重点研发项目(2022YFE0101000)

Coupling Local Features and Global Representations for 2D Human Pose Estimation

CHEN Qiaosong¹, WU Jiliang¹, JIANG Bo¹, TAN Chongchong², SUN Kaiwei¹, DEN Xin¹, WANG Jin¹

1 School of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,China
2 School of Automation/School of Industrial Internet,Chongqing University of Posts and Telecommunications,Chongqing 400065,China

Published:2023-11-09
About author:CHEN Qiaosong,born in 1978,Ph.D,associate professor.His main research interests include image processing,image understanding,artificial intelligence and computer vision.
Supported by:
National Key Research and Development Program of China(2022YFE0101000).

摘要/Abstract

摘要： 近年来卷积神经网络和Transformer都在人体姿态估计领域中取得进步,卷积神经网络(Convolutional neural network,CNN)擅长提取局部特征,Transformer擅长捕捉全局表征,但目前结合两者实现人体姿态估计的研究较少且效果不佳。针对此问题,提出一种耦合局部特征和全局表征的的模型CNPose(CNN-Nest Pose),该框架的局部-全局特征耦合模块利用多头注意力计算和残差结构的方式深度耦合局部特征和全局表征;还提出了局部-全局信息交流模块解决局部-全局特征耦合模块在计算过程中局部特征和全局表征数据源范围不一致的问题。在COCO-val2017和COCO-dev-test2017数据集上进行了验证,实验表明,采用了局部特征和全局表征耦合的CNPose模型相较于同类型方法有着更为优越的表现。

关键词: 人体姿态估计, Transformer, 卷积神经网络, 局部特征, 全局表征, 特征耦合, 注意力

Abstract: In recent years,both convolutional neural network and Transformer have made progress in the field of human pose estimation.Convolutional neural network(CNN) is good at extracting local features,and Transformer does well in capturing global representations.However,there are few studies on the combination of the two to achieve human pose estimation,as the same time the results are not good.Aiming at solving this problem,this paper proposes a model CNPose(CNN-Nest Pose) that couples local features and global representations.The local-global feature coupling module of this framework uses multi-head attention calculation method and residual structure to deeply couple local features and global representations.At the same time this paper proposes a local-global information exchange module to solve the problem that therange of data sources of local features and global representationis inconsistent in the local-global feature coupling module during the calculation process.The CNPose framework has been verified on COCO-val2017 and COCO-dev-test2017 datasets.Experiment results show that the CNPose model using the coupling of local features and global representations has superior performance compared to similar methods.

Key words: Human pose estimation, Transformer, Convolutional neural networks, Local features, Global representations, Feature coupling, Attention

中图分类号:

TP391.4

陈乔松, 吴济良, 蒋波, 谭冲冲, 孙开伟, 邓欣, 王进. 基于局部特征与全局表征耦合的2D人体姿态估计[J]. 计算机科学, 2023, 50(11A): 221100007-5. https://doi.org/10.11896/jsjkx.221100007

CHEN Qiaosong, WU Jiliang, JIANG Bo, TAN Chongchong, SUN Kaiwei, DEN Xin, WANG Jin. Coupling Local Features and Global Representations for 2D Human Pose Estimation[J]. Computer Science, 2023, 50(11A): 221100007-5. https://doi.org/10.11896/jsjkx.221100007

参考文献

[1]IQBAL U,MILAN A,GALL J.Posetrack:Joint multi-person pose estimation and tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:2011-2020.
[2]HUANG S,GONG M,TAO D.A coarse-fine network for keypoint localization[C]//Proceedings of the IEEE International Conference on Computer Vision.Los Alamitos:IEEE Computer Society,2017:3028-3037.
[3]PISHCHULIN L,INSAFUTDINOV E,TANG S,et al.Deep-cut:Joint subset partition and labeling for multi person pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:4929-4937.
[4]CAO Z,SIMON T,WEI S E,et al.Realtime multi-person 2dpose estimation using part affinity fields[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:7291-7299.
[5]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[J].Communications of the ACM,2017,60(6):84-90.
[6]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:770-778.
[7]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[8]HOWARD A G,ZHU M,CHEN B,et al.Mobilenets:Efficientconvolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017.
[9]TOSHEV A,SZEGEDY C.Deeppose:Human pose estimationvia deep neural networks[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.Columbus:IEEE,2014:1653-1660.
[10]WEI S E,RAMAKRISHNA V,KANADE T,et al.Convolu-tional pose machines[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:4724-4732.
[11]SUN K,XIAO B,LIU D,et al.Deep high-resolution representation learning for human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE,2019:5693-5703.
[12]MAO W,GE Y,SHEN C,et al.Tfpose:Direct human pose estimation with transformers[J].arXiv:2103.15320,2021.
[13]ZHANG Z,ZHANG H,ZHAO L,et al.Aggregating nestedtransformers[J].arXiv:2105.12723,2021.
[14]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European Conference on Computer Vision.Cham:Springer International Publishing,2014.
[15]RUGGERO RONCHI M,PERONA P.Benchmarking and error diagnosis in multi-instance pose estimation[C]//Proceedings of the IEEE International Conference on Computer Vision.Venice:IEEE,2017:369-378.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于局部特征与全局表征耦合的2D人体姿态估计

Coupling Local Features and Global Representations for 2D Human Pose Estimation

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0