计算机科学 ›› 2026, Vol. 53 ›› Issue (5): 228-236.doi: 10.11896/jsjkx.250800025
李宗民1,2, 王立1, 李亚传1, 刘玉杰1, 戎光彩1, 刘为韩1, 马文康1
LI Zongmin1,2, WANG Li1, LI Yachuan1, LIU Yujie1, RONG Guangcai1, LIU Weihan1, MA Wenkang1
摘要: 人体姿态估计(HPE) 是计算机视觉领域的基础任务之一,旨在准确定位人体关键点并理解人体结构,这对行为识别与检测等下游任务有指导性作用。当前人体姿态估计方法在深度学习的加持下取得了突破性进展,然而在人群密集和人体姿态变化大的复杂运动场景中,现有方法难以应对目标尺度变化大、遮挡及细节丢失等新的挑战。针对这些问题,提出了一种融合离散小波变换(DWT) 与高分辨率网络(HRNet) 的改进架构EFW-HRNet。引入基于 DWT 的下采样与特征融合模块,以捕捉和保留多尺度细节;设计频带间交叉注意力模块(CBA),实现 DWT 子带特征的自适应交互,提升对遮挡的鲁棒性;应用频带通道压缩(FBCC) 策略来压缩高频通道,显著降低计算冗余提升模型效率。在 COCO 数据集上的实验结果表明,EFW-HRNet 相较于强基线 UDP HRNet-W32 在AP上获得了4.0个百分点的大幅提升。消融实验验证了 DWT 模块、CBA 及 FBCC 策略的有效性,其中 FBCC 在精度与效率间实现了良好平衡,仅牺牲约0.8个百分点的AP,就换来了参数量约66%和计算量约51%的大幅削减。
中图分类号:
| [1]SUN K,XIAO B,LIU D,et al.Deep High-Resolution Represen-tation Learning for Human Pose Estimation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019. [2]JIANG J,XIA N,ZHOU S.A Multi-Type Feature Fusion Network Based on Importance Weighting for Occluded Human Pose Estimation[J].IEEE/CAA Journal of Automatica Sinica,2025,12(4):789-805. [3]WILLIAMS T,LI R.Wavelet pooling for convolutional neural networks[C]//International Conference on Learning Representations.2018. [4]LIU P,ZHANG H,ZHANG K,et al.Multi-level Wavelet-CNN for Image Restoration[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2018. [5]LIU M,XIANG D,CHENG X,et al.Disentangling Imperfect:AWavelet-Infused Multilevel Heterogeneous Network for Human Activity Recognition in Flawed Wearable Sensor Data[J].ar-Xiv:2402.09434,2024. [6]LIU J,WANG J,ZHANG P,et al.Multi-scale wavelet trans-former for face forgery detection[C]//Proceedings of the Asian Conference on Computer Vision.2022:1858-1874. [7]CHEN C L,YAO H L,JIAN B L.Discrete Wavelet Transform Sampling for Image Super Resolution[J].Applied Artificial Intelligence,2025,39(1):2449296. [8]LU X,LI Y,CHEN X,et al.Discrete wavelet transform assisted convolutional neural network equalizer for PAM VLC system[J].Optics Express,2024,32(6):10429-10443. [9]MAGISTRIS G D,ROMANO M,STARCZEWSKI J T,et al.A Novel DWT-based Encoder for Human Pose Estimation[EB/OL].https://ceur-ws.org/Vol-3360/p05.pdf. [10]CHEN Y,WANG Z,PENG Y,et al.Cascaded Pyramid Network for Multi-Person Pose Estimation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2018. [11]CAO Z,HIDALGO G,SIMON T,et al.Openpose:Realtimemulti-person 2d pose estimation using part affinity fields[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,43(1):172-186. [12]NEWELL A,YANG K,DENG J.Stacked Hourglass Networks for Human Pose Estimation[C]//ECCV.2026:483-499. [13]LI R,YAN A,YANG S,et al.Human Pose Estimation Based on Efficient and Lightweight High-Resolution Network[J].Sensors,2024,24(2):396. [14]JI X,NIU Y.A Lightweight Network for Human Pose Estimation Based on ECA Attention Mechanism[J].Electronics,2023,13(1):150. [15]WANG J,SUN K,CHENG T,et al.Deep High-Resolution Representation Learning for Visual Recognition[J].Institute of Electrical and Electronics Engineers,2021,43(10):3349-3364. [16]YU C,XIAO B,GAO C,et al.Lite-HRNet:A LightweightHigh-Resolution Network[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021. [17]LI Q,ZHANG Z,XIAO F,et al.Dite-HRNet:Dynamic Lightweight High-Resolution Network for Human Pose Estimation[J].arXiv:2204.10762,2022. [18]LUO J,HAN P,QIU J,et al.An improved human pose estimation model based on DEKR[C]//International Conference on Computer Graphics,Artificial Intelligence,and Data Processing(ICCAID 2023).2024:705-712. [19]LUO W,XUE J.Human Pose Estimation Based on Improved HRNet Model[C]//2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence(CCAI).IEEE,2023:153-157. [20]SUN R,LIN Z,LENG S,et al.An In-Depth Analysis of 2D and 3D Pose Estimation Techniques in Deep Learning:Methodologies and Advances[J].Electronics,2025,14(7):1307. [21]HUANG J,ZHU Z,GUO F,et al.The Devil is in the Details:Delving into Unbiased Data Processing for Human Pose Estimation[J].arXiv:1911.07524,2019. [22]XIAO B,WU H,WEI Y.Simple Baselines for Human Pose Estimation and Tracking[J].arXiv:1804.06208,2018. [23]GAO X,QIU T,ZHANG X,et al.Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring[J].arXiv:2401.00027,2023. [24]TAYEB S,MEDIANI H,MEKOUAR S,et al.Advancing human action recognition:wavelet-DTW enhanced deep learning with multi-head attention[J].International Journal of Innovative Computing and Applications,2025,15(2):102-117. [25]PACHECO J,BENITEZ V H,PEREZ G,et al.Wavelet-based computational intelligence for real-time anomaly detection and fault isolation in embedded systems[J].Machines,2024,12(9):664. [26]ZHAO Q,ZHENG C,LIU M,et al.Poseformerv2:Exploring frequency domain for efficient and robust 3d human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:8877-8886. [27]TANG Z,HAO Y,LI J,et al.FTCM:Frequency-temporal collaborative module for efficient 3D human pose estimation in vi-deo[J].IEEE Transactions on Circuits and Systems for Video Technology,2023,34(2):911-923. [28]YUAN Y,FU R,HUANG L,et al.Hrformer:High-resolution transformer for dense prediction[J].arXiv:2110.09408,2021. [29]YANG S,QUAN Z,NIE M,et al.Transpose:Keypoint localization via transformer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:11802-11812. [30]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:Common Objects in Context[C]//ECCV 2014.2014. [31]ANDRILUKA M,PISHCHULIN L,GEHLER P,et al.2D Human Pose Estimation:New Benchmark and State of the Art Analysis[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition.2014. [32]Contributors M M P.OpenMMLab pose estimation toolbox and benchmark[EB/OL].https://github.com/open-mmlab/mmpose. [33]ZHANG F,ZHU X,DAI H,et al.Distribution-Aware Coordi-nate Representation for Human Pose Estimation[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2020. [34]WANG C,ZHANG F,ZHU X,et al.Low-resolution humanpose estimation[J].Pattern Recognition,2022,126:108579. [35]ZHANG Z,ZHANG Y,ZHANG Y,et al.Vital information is only worth one thumbnail:Towards efficient human pose estimation[J].Pattern Recognition,2024,147(C):110111. [36]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022. [37]XU Y,ZHANG J,ZHANG Q,et al.Vitpose:Simple visiontransformer baselines for human pose estimation[J].Advances in Neural Information Processing Systems,2022,35:38571-38584. [38]LI Y,ZHANG S,WANG Z,et al.TokenPose:Learning Keypoint Tokens for Human Pose Estimation[J].arXiv:2104.03516,2021. [39]YAN Z X,BAI L,LI T S.Lightweight Human Pose Estimation Based on Self-knowledge Distillation and Convolution Compression[J].Journal of Chinese Computer Systems,2024,45(2):461-469. [40]GUAN X,ZHOU Z J,LI Q.Human pose estimation based on graph structure guidance and location information enhancement[J].Journal of Jilin University(Engineering and Technology Edition),2025,55(10):3283-3295. |
|
||