Computer Science ›› 2023, Vol. 50 ›› Issue (11A): 221100007-5.doi: 10.11896/jsjkx.221100007

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Coupling Local Features and Global Representations for 2D Human Pose Estimation

CHEN Qiaosong1, WU Jiliang1, JIANG Bo1, TAN Chongchong2, SUN Kaiwei1, DEN Xin1, WANG Jin1   

  1. 1 School of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,China
    2 School of Automation/School of Industrial Internet,Chongqing University of Posts and Telecommunications,Chongqing 400065,China
  • Published:2023-11-09
  • About author:CHEN Qiaosong,born in 1978,Ph.D,associate professor.His main research interests include image processing,image understanding,artificial intelligence and computer vision.
  • Supported by:
    National Key Research and Development Program of China(2022YFE0101000).

Abstract: In recent years,both convolutional neural network and Transformer have made progress in the field of human pose estimation.Convolutional neural network(CNN) is good at extracting local features,and Transformer does well in capturing global representations.However,there are few studies on the combination of the two to achieve human pose estimation,as the same time the results are not good.Aiming at solving this problem,this paper proposes a model CNPose(CNN-Nest Pose) that couples local features and global representations.The local-global feature coupling module of this framework uses multi-head attention calculation method and residual structure to deeply couple local features and global representations.At the same time this paper proposes a local-global information exchange module to solve the problem that therange of data sources of local features and global representationis inconsistent in the local-global feature coupling module during the calculation process.The CNPose framework has been verified on COCO-val2017 and COCO-dev-test2017 datasets.Experiment results show that the CNPose model using the coupling of local features and global representations has superior performance compared to similar methods.

Key words: Human pose estimation, Transformer, Convolutional neural networks, Local features, Global representations, Feature coupling, Attention

CLC Number: 

  • TP391.4
[1]IQBAL U,MILAN A,GALL J.Posetrack:Joint multi-person pose estimation and tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:2011-2020.
[2]HUANG S,GONG M,TAO D.A coarse-fine network for keypoint localization[C]//Proceedings of the IEEE International Conference on Computer Vision.Los Alamitos:IEEE Computer Society,2017:3028-3037.
[3]PISHCHULIN L,INSAFUTDINOV E,TANG S,et al.Deep-cut:Joint subset partition and labeling for multi person pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:4929-4937.
[4]CAO Z,SIMON T,WEI S E,et al.Realtime multi-person 2dpose estimation using part affinity fields[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:7291-7299.
[5]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[J].Communications of the ACM,2017,60(6):84-90.
[6]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:770-778.
[7]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[8]HOWARD A G,ZHU M,CHEN B,et al.Mobilenets:Efficientconvolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017.
[9]TOSHEV A,SZEGEDY C.Deeppose:Human pose estimationvia deep neural networks[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.Columbus:IEEE,2014:1653-1660.
[10]WEI S E,RAMAKRISHNA V,KANADE T,et al.Convolu-tional pose machines[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:4724-4732.
[11]SUN K,XIAO B,LIU D,et al.Deep high-resolution representation learning for human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE,2019:5693-5703.
[12]MAO W,GE Y,SHEN C,et al.Tfpose:Direct human pose estimation with transformers[J].arXiv:2103.15320,2021.
[13]ZHANG Z,ZHANG H,ZHAO L,et al.Aggregating nestedtransformers[J].arXiv:2105.12723,2021.
[14]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European Conference on Computer Vision.Cham:Springer International Publishing,2014.
[15]RUGGERO RONCHI M,PERONA P.Benchmarking and error diagnosis in multi-instance pose estimation[C]//Proceedings of the IEEE International Conference on Computer Vision.Venice:IEEE,2017:369-378.
[1] LI Ke, YANG Ling, ZHAO Yanbo, CHEN Yonglong, LUO Shouxi. EGCN-CeDML:A Distributed Machine Learning Framework for Vehicle Driving Behavior Prediction [J]. Computer Science, 2023, 50(9): 318-330.
[2] WANG Huaiqin, LUO Jian, WANG Haiyan. Feature Weight Perception-based Prediction of Virtual Network Function Resource Demands [J]. Computer Science, 2023, 50(9): 331-336.
[3] WANG Wei, DU Xiangcheng, JIN Cheng. Image Relighting Network Based on Context-gated Residuals and Multi-scale Attention [J]. Computer Science, 2023, 50(9): 168-175.
[4] HUANG Hanqiang, XING Yunbing, SHEN Jianfei, FAN Feiyi. Sign Language Animation Splicing Model Based on LpTransformer Network [J]. Computer Science, 2023, 50(9): 184-191.
[5] CHEN Guojun, YUE Xueyan, ZHU Yanning, FU Yunpeng. Study on Building Extraction Algorithm of Remote Sensing Image Based on Multi-scale Feature Fusion [J]. Computer Science, 2023, 50(9): 202-209.
[6] BAI Zhengyao, XU Zhu, ZHANG Yihan. Deep Artificial Correspondence Generation for 3D Point Cloud Registration [J]. Computer Science, 2023, 50(9): 210-219.
[7] LI Xiang, FAN Zhiguang, LIN Nan, CAO Yangjie, LI Xuexiang. Self-supervised Learning for 3D Real-scenes Question Answering [J]. Computer Science, 2023, 50(9): 220-226.
[8] YI Liu, GENG Xinyu, BAI Jing. Hierarchical Multi-label Text Classification Algorithm Based on Parallel Convolutional Network Information Fusion [J]. Computer Science, 2023, 50(9): 278-286.
[9] LUO Yuanyuan, YANG Chunming, LI Bo, ZHANG Hui, ZHAO Xujian. Chinese Medical Named Entity Recognition Method Incorporating Machine ReadingComprehension [J]. Computer Science, 2023, 50(9): 287-294.
[10] ZHANG Yian, YANG Ying, REN Gang, WANG Gang. Study on Multimodal Online Reviews Helpfulness Prediction Based on Attention Mechanism [J]. Computer Science, 2023, 50(8): 37-44.
[11] TENG Sihang, WANG Lie, LI Ya. Non-autoregressive Transformer Chinese Speech Recognition Incorporating Pronunciation- Character Representation Conversion [J]. Computer Science, 2023, 50(8): 111-117.
[12] YANG Zhizhuo, XU Lingling, Zhang Hu, LI Ru. Answer Extraction Method for Reading Comprehension Based on Frame Semantics and GraphStructure [J]. Computer Science, 2023, 50(8): 170-176.
[13] WANG Jiahao, ZHONG Xin, LI Wenxiong, ZHAO Dexin. Human Activity Recognition with Meta-learning and Attention [J]. Computer Science, 2023, 50(8): 193-201.
[14] WANG Yu, WANG Zuchao, PAN Rui. Survey of DGA Domain Name Detection Based on Character Feature [J]. Computer Science, 2023, 50(8): 251-259.
[15] YAN Mingqiang, YU Pengfei, LI Haiyan, LI Hongsong. Arbitrary Image Style Transfer with Consistent Semantic Style [J]. Computer Science, 2023, 50(7): 129-136.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!