计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 221100007-5.doi: 10.11896/jsjkx.221100007
陈乔松1, 吴济良1, 蒋波1, 谭冲冲2, 孙开伟1, 邓欣1, 王进1
CHEN Qiaosong1, WU Jiliang1, JIANG Bo1, TAN Chongchong2, SUN Kaiwei1, DEN Xin1, WANG Jin1
摘要: 近年来卷积神经网络和Transformer都在人体姿态估计领域中取得进步,卷积神经网络(Convolutional neural network,CNN)擅长提取局部特征,Transformer擅长捕捉全局表征,但目前结合两者实现人体姿态估计的研究较少且效果不佳。针对此问题,提出一种耦合局部特征和全局表征的的模型CNPose(CNN-Nest Pose),该框架的局部-全局特征耦合模块利用多头注意力计算和残差结构的方式深度耦合局部特征和全局表征;还提出了局部-全局信息交流模块解决局部-全局特征耦合模块在计算过程中局部特征和全局表征数据源范围不一致的问题。在COCO-val2017和COCO-dev-test2017数据集上进行了验证,实验表明,采用了局部特征和全局表征耦合的CNPose模型相较于同类型方法有着更为优越的表现。
中图分类号:
[1]IQBAL U,MILAN A,GALL J.Posetrack:Joint multi-person pose estimation and tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:2011-2020. [2]HUANG S,GONG M,TAO D.A coarse-fine network for keypoint localization[C]//Proceedings of the IEEE International Conference on Computer Vision.Los Alamitos:IEEE Computer Society,2017:3028-3037. [3]PISHCHULIN L,INSAFUTDINOV E,TANG S,et al.Deep-cut:Joint subset partition and labeling for multi person pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:4929-4937. [4]CAO Z,SIMON T,WEI S E,et al.Realtime multi-person 2dpose estimation using part affinity fields[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:7291-7299. [5]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[J].Communications of the ACM,2017,60(6):84-90. [6]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:770-778. [7]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014. [8]HOWARD A G,ZHU M,CHEN B,et al.Mobilenets:Efficientconvolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017. [9]TOSHEV A,SZEGEDY C.Deeppose:Human pose estimationvia deep neural networks[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.Columbus:IEEE,2014:1653-1660. [10]WEI S E,RAMAKRISHNA V,KANADE T,et al.Convolu-tional pose machines[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:4724-4732. [11]SUN K,XIAO B,LIU D,et al.Deep high-resolution representation learning for human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE,2019:5693-5703. [12]MAO W,GE Y,SHEN C,et al.Tfpose:Direct human pose estimation with transformers[J].arXiv:2103.15320,2021. [13]ZHANG Z,ZHANG H,ZHAO L,et al.Aggregating nestedtransformers[J].arXiv:2105.12723,2021. [14]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European Conference on Computer Vision.Cham:Springer International Publishing,2014. [15]RUGGERO RONCHI M,PERONA P.Benchmarking and error diagnosis in multi-instance pose estimation[C]//Proceedings of the IEEE International Conference on Computer Vision.Venice:IEEE,2017:369-378. |
|