Computer Science ›› 2025, Vol. 52 ›› Issue (3): 68-76.doi: 10.11896/jsjkx.240600063

• 3D Vision and Metaverse • Previous Articles     Next Articles

Multi-view Multi-person 3D Human Pose Estimation Based on Center-point Attention

JIANG Yiheng1, LI Yang1,2, LIU Chunyan1 , ZHAO Yunlong1   

  1. 1 College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
    2 Unmanned Aerial Vehicles Research Institute,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
  • Received:2024-06-06 Revised:2024-09-26 Online:2025-03-15 Published:2025-03-07
  • About author:JIANG Yiheng,born in 1999, postgra-duate.His main research interests include artificial intelligence and compu-ter vision.
    LI Yang,born in 1986,Ph.D,is a member of CCF(No.J4845M).His main research interests include artificial intelligence,collective computing and privacy protection.
  • Supported by:
    National Science and Technology Major Project(2022ZD0115403).

Abstract: Multi-view multi-person 3D human pose estimation is widely used in various computer vision tasks.Current spatial voxel-based methods are difficult to achieve real-time computing on edge computing devices due to huge resource consumption.However,the regression method has limited generalization ability due to the lack of geometric constraints.In a new environment,it cannot be directly applied and needs to collect data for fine-tuning.By combining the spatial voxel method and the regression-based pose estimation method,we propose a multi-view multi-person 3D human pose estimation model based on center point attention regression.The model roughly estimates the position of the human body center through a small-scale voxel network,and constructs the initial pose based on it.Then the regression prediction is carried out within the range of the human body center point to obtain more accurate human pose.In this study,by combining the spatial key point positions,the regression prediction of the model is more accurate,and the average accuracy is improved by 1.16% on large scales.At the same time,the model is very easy to train,and the accuracy is improved by up to 12% in small sample fine-tuning.This allows regression-based models to greatly expand the generalization performance and versatility of such models in new scenarios by rapidly deploying them with small amounts of training data.

Key words: 3D human pose estimation, Multi-view, Center-point proposal network, Center-point attention, Transformer, VoxelNet

CLC Number: 

  • TP311
[1]TU H,WANG C,ZENG W.Voxelpose:Towards multi-camera 3d human pose estimation in wild environment[C]//ECCV 2020:16th European Conference,Glasgow,UK,August 23-28,2020,Proceedings,Part I 16.Springer,2020:197-212.
[2]ZHANG J,CAI Y,YAN S,et al.Direct multi-view multi-person 3d pose estimation[J].Advances in Neural Information Proces-sing Systems,2021,34:153-164.
[3]MARTINEZ J,HOSSAIN R,ROMERO J,et al.A simple yeteffective baseline for 3d human pose estimation[C]//Procee-dings of the IEEE International Conference on Computer Vision.2017:2640-2649.
[4]GONG K,ZHANG J,FENG J.Poseaug:A differentiable pose augmentation framework for 3d human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:8575-8584.
[5]SUN X,XIAO B,WEI F,et al.Integral human pose regression[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:529-545.
[6]POPA A I,ZANFIR M,SMINCHISESCU C.Deep multitask architecture for integrated 2d and 3d human sensing[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6289-6298.
[7]MEHTA D,SRIDHAR S,SOTNYCHENKO O,et al.with asingle rgb camera[J].ACM Transactions on Graphics,2017,36(4):1-14.
[8]ZHAO L,PENG X,TIAN Y,et al.Semantic graph convolu-tional networks for 3d human pose regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3425-3435.
[9]HARTLEYR,ZISSERMAN A.Multiple view geometry in computer vision[M].Cambridge University Press,2003.
[10]ISKAKOV K,BURKOV E,LEMPITSKY V,et al.Learnabletriangulation of human pose[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:7718-7727.
[11]QIU H,WANG C,WANG J,et al.Cross view fusion for 3d human pose estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:4342-4351.
[12]PAVLAKOS G,ZHOU X,DERPANIS K G,et al.Harvesting multiple views for marker-less 3d human pose annotations[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6988-6997.
[13]DONG J,JIANG W,HUANG Q,et al.Fast and robust multi-person 3d pose estimation from multiple views[J].IEEE Tran-sactions on Pattern Analysis and Machine Intelligence,2044,44(10):6981-6992.
[14]WU S,JIN S,LIU W,et al.Graph-based 3d multi-person pose estimation using multi-view images[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:148-157.
[15]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems December.2017:6000-6010.
[16]DOSOVITSKIY A, BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020.
[17]BELAGIANNIS V,AMIN S,ANDRILUKA M,et al.3d picto-rial structures for multiple human pose estimation[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:1669-1676.
[18]HANBYUL J,LIU H,TAN L,et al.Panoptic studio:A mas-sively multiview system for social interaction capture[C]//2015 IEEE International Conference on Computer Vision.2016.
[19]IONESCU C,PAPAVA D,OLARU V,et al.Human3.6m:Large scale datasets and predictive methods for 3d human sen-sing in natural environments[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,36(7):1325-1339.
[20]CAI Z,REN D,ZENG A,et al.Humman:Multi-modal 4d human dataset for versatile sensing and modeling[C]//ECCV 2022.Springer,2022:557-577.
[21]WANG J,YANG F,GOU W,et al.Freeman:Towards benchmarking 3d human pose estimation in the wild[J].arXiv:2309.05073,2023.
[22]QIU L,ZHANG X,LI Y,et al.Peeking into occluded joints:A novel framework for crowd pose estimation[C]//ECCV 2020:16th European Conference,Glasgow,UK,August 23-28,2020,Proceedings,Part XIX 16.Springer,2020:488-504.
[23]CI H,WANG C,MA X,et al.Optimizing network structure for 3d human pose estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:2262-2271.
[24]TANG W,WU Y.Does learning specific features for relatedparts help human pose estimation?[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:1107-1116.
[25]SUN Z Y,LI H Y,YE J Y.3D human joint point recognition based on weakly supervised migration network[J].Journal of Jilin University(Engineering and Technology Edition),2024,54(1):251-258.
[26]MOON G,CHANG J Y,LEE K M.V2v-posenet:Voxel-to-vo-xel prediction network for accurate 3d hand and human pose estimation mfrom a single depth map[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5079-5088.
[27]ZHANG Y,WANG C,WANG X,et al.Voxeltrack:Multi-person 3d human pose estimation and tracking in the wild[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(2):2613-2626.
[28]YE H, ZHU W,WANG C,et al.Faster voxelpose:Real-time 3d human pose estimation by orthographic projection[C]//ECCV 2022.Springer,2022:142-159.
[29]LIN J,LEE J H.Multi-view multi-person 3d pose estimationwith plane sweep stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:11886-11895.
[30]CHEN Y,GU R,HUANG O,et al.Vtp:volumetric transformer for multi-view multi-person 3d pose estimation[J].Applied Intelligence,2023,53(22)26568-26579.
[31]LIU H,WU J,HE R.Center point to pose:Multiple views 3d human pose estimation for multi-person[J].Plos One,2022,17(9):e0274450.
[1] CHEN Guangyuan, WANG Zhaohui, CHENG Ze. Multi-view Stereo Reconstruction with Context-guided Cost Volume and Depth Refinemen [J]. Computer Science, 2025, 52(3): 231-238.
[2] SHENG Sirou, OUYANG Xiao, TAO Hong, HOU Chenping. Multi-view Multi-label Learning with Label Correlation Priors [J]. Computer Science, 2025, 52(2): 58-66.
[3] XIN Yongjie, CAI Jianghui, HE Yanting, SU Meihong, SHI Chenhui, YANG Haifeng. Multi-view Clustering Based on Cross-structural Feature Selection and Graph Cycle AdaptiveLearning [J]. Computer Science, 2025, 52(2): 145-157.
[4] LI Yujie, MA Zihang, WANG Yifu, WANG Xinghe, TAN Benying. Survey of Vision Transformers(ViT) [J]. Computer Science, 2025, 52(1): 194-209.
[5] LIU Qian, BAI Zhihao, CHENG Chunling, GUI Yaocheng. Image-Text Sentiment Classification Model Based on Multi-scale Cross-modal Feature Fusion [J]. Computer Science, 2024, 51(9): 258-264.
[6] LI Zhi, LIN Sen, ZHANG Qiang. Edge Cloud Computing Approach for Intelligent Fault Detection in Rail Transit [J]. Computer Science, 2024, 51(9): 331-337.
[7] WEI Xiangxiang, MENG Zhaohui. Hohai Graphic Protein Data Bank and Prediction Model [J]. Computer Science, 2024, 51(8): 117-123.
[8] XU Bei, LIU Tong. Semi-supervised Emotional Music Generation Method Based on Improved Gaussian Mixture Variational Autoencoders [J]. Computer Science, 2024, 51(8): 281-296.
[9] LEI Yongsheng, DING Meng, SHEN Yao, LI Juhao, ZHAO Dongyue, CHEN Fushi. Action Recognition Model Based on Improved Two Stream Vision Transformer [J]. Computer Science, 2024, 51(7): 229-235.
[10] LIU Xiaohu, CHEN Defu, LI Jun, ZHOU Xuwen, HU Shan, ZHOU Hao. Speaker Verification Network Based on Multi-scale Convolutional Encoder [J]. Computer Science, 2024, 51(6A): 230700083-6.
[11] WANG Yingjie, ZHANG Chengye, BAI Fengbo, WANG Zumin. Named Entity Recognition Approach of Judicial Documents Based on Transformer [J]. Computer Science, 2024, 51(6A): 230500164-9.
[12] PENG Bo, LI Yaodong, GONG Xianfu. Improved K-means Photovoltaic Energy Data Cleaning Method Based on Autoencoder [J]. Computer Science, 2024, 51(6A): 230700070-5.
[13] WU Yibo, HAO Yingguang, WANG Hongyu. Rice Defect Segmentation Based on Dual-stream Convolutional Neural Networks [J]. Computer Science, 2024, 51(6A): 230600107-8.
[14] WU Fengyuan, LIU Ming, YIN Xiaokang, CAI Ruijie, LIU Shengli. Remote Access Trojan Traffic Detection Based on Fusion Sequences [J]. Computer Science, 2024, 51(6): 434-442.
[15] YU Bihui, TAN Shuyue, WEI Jingxuan, SUN Linzhuang, BU Liping, ZHAO Yiman. Vision-enhanced Multimodal Named Entity Recognition Based on Contrastive Learning [J]. Computer Science, 2024, 51(6): 198-205.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!