Computer Science ›› 2025, Vol. 52 ›› Issue (3): 68-76.doi: 10.11896/jsjkx.240600063

• 3D Vision and Metaverse • Previous Articles     Next Articles

Multi-view Multi-person 3D Human Pose Estimation Based on Center-point Attention

JIANG Yiheng1, LI Yang1,2, LIU Chunyan1 , ZHAO Yunlong1   

  1. 1 College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
    2 Unmanned Aerial Vehicles Research Institute,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
  • Received:2024-06-06 Revised:2024-09-26 Online:2025-03-15 Published:2025-03-07
  • About author:JIANG Yiheng,born in 1999, postgra-duate.His main research interests include artificial intelligence and compu-ter vision.
    LI Yang,born in 1986,Ph.D,is a member of CCF(No.J4845M).His main research interests include artificial intelligence,collective computing and privacy protection.
  • Supported by:
    National Science and Technology Major Project(2022ZD0115403).

Abstract: Multi-view multi-person 3D human pose estimation is widely used in various computer vision tasks.Current spatial voxel-based methods are difficult to achieve real-time computing on edge computing devices due to huge resource consumption.However,the regression method has limited generalization ability due to the lack of geometric constraints.In a new environment,it cannot be directly applied and needs to collect data for fine-tuning.By combining the spatial voxel method and the regression-based pose estimation method,we propose a multi-view multi-person 3D human pose estimation model based on center point attention regression.The model roughly estimates the position of the human body center through a small-scale voxel network,and constructs the initial pose based on it.Then the regression prediction is carried out within the range of the human body center point to obtain more accurate human pose.In this study,by combining the spatial key point positions,the regression prediction of the model is more accurate,and the average accuracy is improved by 1.16% on large scales.At the same time,the model is very easy to train,and the accuracy is improved by up to 12% in small sample fine-tuning.This allows regression-based models to greatly expand the generalization performance and versatility of such models in new scenarios by rapidly deploying them with small amounts of training data.

Key words: 3D human pose estimation, Multi-view, Center-point proposal network, Center-point attention, Transformer, VoxelNet

CLC Number: 

  • TP311
[1]TU H,WANG C,ZENG W.Voxelpose:Towards multi-camera 3d human pose estimation in wild environment[C]//ECCV 2020:16th European Conference,Glasgow,UK,August 23-28,2020,Proceedings,Part I 16.Springer,2020:197-212.
[2]ZHANG J,CAI Y,YAN S,et al.Direct multi-view multi-person 3d pose estimation[J].Advances in Neural Information Proces-sing Systems,2021,34:153-164.
[3]MARTINEZ J,HOSSAIN R,ROMERO J,et al.A simple yeteffective baseline for 3d human pose estimation[C]//Procee-dings of the IEEE International Conference on Computer Vision.2017:2640-2649.
[4]GONG K,ZHANG J,FENG J.Poseaug:A differentiable pose augmentation framework for 3d human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:8575-8584.
[5]SUN X,XIAO B,WEI F,et al.Integral human pose regression[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:529-545.
[6]POPA A I,ZANFIR M,SMINCHISESCU C.Deep multitask architecture for integrated 2d and 3d human sensing[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6289-6298.
[7]MEHTA D,SRIDHAR S,SOTNYCHENKO O,et al.with asingle rgb camera[J].ACM Transactions on Graphics,2017,36(4):1-14.
[8]ZHAO L,PENG X,TIAN Y,et al.Semantic graph convolu-tional networks for 3d human pose regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3425-3435.
[9]HARTLEYR,ZISSERMAN A.Multiple view geometry in computer vision[M].Cambridge University Press,2003.
[10]ISKAKOV K,BURKOV E,LEMPITSKY V,et al.Learnabletriangulation of human pose[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:7718-7727.
[11]QIU H,WANG C,WANG J,et al.Cross view fusion for 3d human pose estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:4342-4351.
[12]PAVLAKOS G,ZHOU X,DERPANIS K G,et al.Harvesting multiple views for marker-less 3d human pose annotations[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6988-6997.
[13]DONG J,JIANG W,HUANG Q,et al.Fast and robust multi-person 3d pose estimation from multiple views[J].IEEE Tran-sactions on Pattern Analysis and Machine Intelligence,2044,44(10):6981-6992.
[14]WU S,JIN S,LIU W,et al.Graph-based 3d multi-person pose estimation using multi-view images[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:148-157.
[15]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems December.2017:6000-6010.
[16]DOSOVITSKIY A, BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020.
[17]BELAGIANNIS V,AMIN S,ANDRILUKA M,et al.3d picto-rial structures for multiple human pose estimation[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:1669-1676.
[18]HANBYUL J,LIU H,TAN L,et al.Panoptic studio:A mas-sively multiview system for social interaction capture[C]//2015 IEEE International Conference on Computer Vision.2016.
[19]IONESCU C,PAPAVA D,OLARU V,et al.Human3.6m:Large scale datasets and predictive methods for 3d human sen-sing in natural environments[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,36(7):1325-1339.
[20]CAI Z,REN D,ZENG A,et al.Humman:Multi-modal 4d human dataset for versatile sensing and modeling[C]//ECCV 2022.Springer,2022:557-577.
[21]WANG J,YANG F,GOU W,et al.Freeman:Towards benchmarking 3d human pose estimation in the wild[J].arXiv:2309.05073,2023.
[22]QIU L,ZHANG X,LI Y,et al.Peeking into occluded joints:A novel framework for crowd pose estimation[C]//ECCV 2020:16th European Conference,Glasgow,UK,August 23-28,2020,Proceedings,Part XIX 16.Springer,2020:488-504.
[23]CI H,WANG C,MA X,et al.Optimizing network structure for 3d human pose estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:2262-2271.
[24]TANG W,WU Y.Does learning specific features for relatedparts help human pose estimation?[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:1107-1116.
[25]SUN Z Y,LI H Y,YE J Y.3D human joint point recognition based on weakly supervised migration network[J].Journal of Jilin University(Engineering and Technology Edition),2024,54(1):251-258.
[26]MOON G,CHANG J Y,LEE K M.V2v-posenet:Voxel-to-vo-xel prediction network for accurate 3d hand and human pose estimation mfrom a single depth map[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5079-5088.
[27]ZHANG Y,WANG C,WANG X,et al.Voxeltrack:Multi-person 3d human pose estimation and tracking in the wild[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(2):2613-2626.
[28]YE H, ZHU W,WANG C,et al.Faster voxelpose:Real-time 3d human pose estimation by orthographic projection[C]//ECCV 2022.Springer,2022:142-159.
[29]LIN J,LEE J H.Multi-view multi-person 3d pose estimationwith plane sweep stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:11886-11895.
[30]CHEN Y,GU R,HUANG O,et al.Vtp:volumetric transformer for multi-view multi-person 3d pose estimation[J].Applied Intelligence,2023,53(22)26568-26579.
[31]LIU H,WU J,HE R.Center point to pose:Multiple views 3d human pose estimation for multi-person[J].Plos One,2022,17(9):e0274450.
[1] WANG Cheng, JIN Cheng. KAN-based Unsupervised Multivariate Time Series Anomaly Detection Network [J]. Computer Science, 2026, 53(1): 89-96.
[2] LI Shunyong, ZHENG Mengjiao, LI Jiaming, ZHAO Xingwang. Joint Spectrum Embedding Clustering Algorithm Based on Multi-view Diversity Learning [J]. Computer Science, 2026, 53(1): 104-114.
[3] HU Hailong, XU Xiangwei, LI Yaqian. Drug Combination Recommendation Model Based on Dynamic Disease Modeling [J]. Computer Science, 2025, 52(9): 96-105.
[4] DENG Jiayan, TIAN Shirui, LIU Xiangli, OUYANG Hongwei, JIAO Yunjia, DUAN Mingxing. Trajectory Prediction Method Based on Multi-stage Pedestrian Feature Mining [J]. Computer Science, 2025, 52(9): 241-248.
[5] DING Zhengze, NIE Rencan, LI Jintao, SU Huaping, XU Hang. MTFuse:An Infrared and Visible Image Fusion Network Based on Mamba and Transformer [J]. Computer Science, 2025, 52(8): 188-194.
[6] LIU Huayong, XU Minghui. Hash Image Retrieval Based on Mixed Attention and Polarization Asymmetric Loss [J]. Computer Science, 2025, 52(8): 204-213.
[7] HUANG Xingyu, WANG Lihui, TANG Kun, CHENG Xinyu, ZHANG Jian, YE Chen. EFormer:Efficient Transformer for Medical Image Registration Based on Frequency Division and Board Attention [J]. Computer Science, 2025, 52(7): 151-160.
[8] WANG Youkang, CHENG Chunling. Multimodal Sentiment Analysis Model Based on Cross-modal Unidirectional Weighting [J]. Computer Science, 2025, 52(7): 226-232.
[9] WANG Jinfu, WANG Siwei, LIANG Weixuan, YU Shengju, ZHU En. Multi-view Clustering Based on Bipartite Graph Cross-view Graph Diffusion [J]. Computer Science, 2025, 52(7): 69-74.
[10] LIU Yajun, JI Qingge. Pedestrian Trajectory Prediction Based on Motion Patterns and Time-Frequency Domain Fusion [J]. Computer Science, 2025, 52(7): 92-102.
[11] LONG Xiao, HUANG Wei, HU Kai. Bi-MI ViT:Bi-directional Multi-level Interaction Vision Transformer for Lung CT ImageClassification [J]. Computer Science, 2025, 52(6A): 240700183-6.
[12] CHEN Xianglong, LI Haijun. LST-ARBunet:An Improved Deep Learning Algorithm for Nodule Segmentation in Lung CT Images [J]. Computer Science, 2025, 52(6A): 240600020-10.
[13] DU Yuanhua, CHEN Pan, ZHOU Nan, SHI Kaibo, CHEN Eryang, ZHANG Yuanpeng. Correntropy Based Multi-view Low-rank Matrix Factorization and Constraint Graph Learning for Multi-view Data Clustering [J]. Computer Science, 2025, 52(6A): 240900131-10.
[14] WANG Xuejian, WANG Yiheng, SUN Xinpo, LIU Chuan, JIA Ming, ZHAO Chao, YANG Chao. Extraction of Crustal Deformation Anomalies Based on Transformer-Isolation Forest [J]. Computer Science, 2025, 52(6A): 240600155-6.
[15] LI Yang, LIU Yi, LI Hao, ZHANG Gang, XU Mingfeng, HAO Chongqing. Human Pose Estimation Using Millimeter Wave Radar Based on Transformer and PointNet++ [J]. Computer Science, 2025, 52(6A): 240400169-9.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!