计算机科学 ›› 2025, Vol. 52 ›› Issue (3): 68-76.doi: 10.11896/jsjkx.240600063
江以恒1, 李洋1,2, 刘春颜1, 赵蕴龙1
JIANG Yiheng1, LI Yang1,2, LIU Chunyan1 , ZHAO Yunlong1
摘要: 多视角多人三维人体姿态估计被广泛应用于各类计算机视觉任务中。当前基于空间体素的方法由于需要消耗巨大的资源难以实现在边缘计算设备上的实时性运算;而回归方法因缺乏几何约束导致泛化能力有限,在新的环境中无法直接应用而需要采集数据进行微调。通过结合空间体素方法与基于回归的姿态估计方法并融合二者的特点,提出了基于中心点注意力回归的多视角多人三维人体姿态估计模型。该模型通过一个小规模的体素网络粗略估计人体中心点位置,并以此构建初始姿态,随后在人体中心点的范围内进行回归预测得到更精确的人体姿态。本研究通过结合空间关键点位置,使得模型的回归预测更加准确,在大尺度上平均准确率提升1.16%,同时使得模型非常容易训练,在小样本微调中准确率最多提升了12%。这使得基于回归的模型可以在新的场景下通过小数据量的训练快速部署而实现泛化性能和通用性的大幅提升。
中图分类号:
[1]TU H,WANG C,ZENG W.Voxelpose:Towards multi-camera 3d human pose estimation in wild environment[C]//ECCV 2020:16th European Conference,Glasgow,UK,August 23-28,2020,Proceedings,Part I 16.Springer,2020:197-212. [2]ZHANG J,CAI Y,YAN S,et al.Direct multi-view multi-person 3d pose estimation[J].Advances in Neural Information Proces-sing Systems,2021,34:153-164. [3]MARTINEZ J,HOSSAIN R,ROMERO J,et al.A simple yeteffective baseline for 3d human pose estimation[C]//Procee-dings of the IEEE International Conference on Computer Vision.2017:2640-2649. [4]GONG K,ZHANG J,FENG J.Poseaug:A differentiable pose augmentation framework for 3d human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:8575-8584. [5]SUN X,XIAO B,WEI F,et al.Integral human pose regression[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:529-545. [6]POPA A I,ZANFIR M,SMINCHISESCU C.Deep multitask architecture for integrated 2d and 3d human sensing[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6289-6298. [7]MEHTA D,SRIDHAR S,SOTNYCHENKO O,et al.with asingle rgb camera[J].ACM Transactions on Graphics,2017,36(4):1-14. [8]ZHAO L,PENG X,TIAN Y,et al.Semantic graph convolu-tional networks for 3d human pose regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3425-3435. [9]HARTLEYR,ZISSERMAN A.Multiple view geometry in computer vision[M].Cambridge University Press,2003. [10]ISKAKOV K,BURKOV E,LEMPITSKY V,et al.Learnabletriangulation of human pose[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:7718-7727. [11]QIU H,WANG C,WANG J,et al.Cross view fusion for 3d human pose estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:4342-4351. [12]PAVLAKOS G,ZHOU X,DERPANIS K G,et al.Harvesting multiple views for marker-less 3d human pose annotations[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6988-6997. [13]DONG J,JIANG W,HUANG Q,et al.Fast and robust multi-person 3d pose estimation from multiple views[J].IEEE Tran-sactions on Pattern Analysis and Machine Intelligence,2044,44(10):6981-6992. [14]WU S,JIN S,LIU W,et al.Graph-based 3d multi-person pose estimation using multi-view images[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:148-157. [15]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems December.2017:6000-6010. [16]DOSOVITSKIY A, BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020. [17]BELAGIANNIS V,AMIN S,ANDRILUKA M,et al.3d picto-rial structures for multiple human pose estimation[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:1669-1676. [18]HANBYUL J,LIU H,TAN L,et al.Panoptic studio:A mas-sively multiview system for social interaction capture[C]//2015 IEEE International Conference on Computer Vision.2016. [19]IONESCU C,PAPAVA D,OLARU V,et al.Human3.6m:Large scale datasets and predictive methods for 3d human sen-sing in natural environments[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,36(7):1325-1339. [20]CAI Z,REN D,ZENG A,et al.Humman:Multi-modal 4d human dataset for versatile sensing and modeling[C]//ECCV 2022.Springer,2022:557-577. [21]WANG J,YANG F,GOU W,et al.Freeman:Towards benchmarking 3d human pose estimation in the wild[J].arXiv:2309.05073,2023. [22]QIU L,ZHANG X,LI Y,et al.Peeking into occluded joints:A novel framework for crowd pose estimation[C]//ECCV 2020:16th European Conference,Glasgow,UK,August 23-28,2020,Proceedings,Part XIX 16.Springer,2020:488-504. [23]CI H,WANG C,MA X,et al.Optimizing network structure for 3d human pose estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:2262-2271. [24]TANG W,WU Y.Does learning specific features for relatedparts help human pose estimation?[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:1107-1116. [25]SUN Z Y,LI H Y,YE J Y.3D human joint point recognition based on weakly supervised migration network[J].Journal of Jilin University(Engineering and Technology Edition),2024,54(1):251-258. [26]MOON G,CHANG J Y,LEE K M.V2v-posenet:Voxel-to-vo-xel prediction network for accurate 3d hand and human pose estimation mfrom a single depth map[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5079-5088. [27]ZHANG Y,WANG C,WANG X,et al.Voxeltrack:Multi-person 3d human pose estimation and tracking in the wild[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(2):2613-2626. [28]YE H, ZHU W,WANG C,et al.Faster voxelpose:Real-time 3d human pose estimation by orthographic projection[C]//ECCV 2022.Springer,2022:142-159. [29]LIN J,LEE J H.Multi-view multi-person 3d pose estimationwith plane sweep stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:11886-11895. [30]CHEN Y,GU R,HUANG O,et al.Vtp:volumetric transformer for multi-view multi-person 3d pose estimation[J].Applied Intelligence,2023,53(22)26568-26579. [31]LIU H,WU J,HE R.Center point to pose:Multiple views 3d human pose estimation for multi-person[J].Plos One,2022,17(9):e0274450. |
|