计算机科学 ›› 2026, Vol. 53 ›› Issue (5): 228-236.doi: 10.11896/jsjkx.250800025

• 计算机图形学 & 多媒体 • 上一篇    下一篇

联合小波分析与频域注意力的高精度人体姿态估计

李宗民1,2, 王立1, 李亚传1, 刘玉杰1, 戎光彩1, 刘为韩1, 马文康1   

  1. 1 中国石油大学(华东)青岛软件学院、计算机科学与技术学院 山东 青岛 266580
    2 山东协和学院 济南 250107
  • 收稿日期:2025-08-06 修回日期:2025-09-15 发布日期:2026-05-08
  • 通讯作者: 李宗民(lizongmin@upc.edu.cn)
  • 基金资助:
    国家重点研发计划(2019YFF0301800);国家自然科学基金(61379106);山东省自然科学基金(ZR2013FM036,ZR2015FM011)

High-accuracy Human Pose Estimation Combining Wavelet Analysis and Frequency-DomainAttention

LI Zongmin1,2, WANG Li1, LI Yachuan1, LIU Yujie1, RONG Guangcai1, LIU Weihan1, MA Wenkang1   

  1. 1 Qingdao Institute of Software, College of Computer Science, Technology, China University of Petroleum(East China), Qingdao, Shandong 266580, China
    2 Shandong Xiehe University, Jinan 250107, China
  • Received:2025-08-06 Revised:2025-09-15 Online:2026-05-08
  • About author:LI Zongmin,born in 1965,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.11175S).His main research interests include computer graphics,digital image processing and pattern recognition.
  • Supported by:
    National Key Research and Development Program of China(2019YFF0301800),National Natural Science Foundation of China(61379106) and Natural Science Foundation of Shandong Province(ZR2013FM036,ZR2015FM011).

摘要: 人体姿态估计(HPE) 是计算机视觉领域的基础任务之一,旨在准确定位人体关键点并理解人体结构,这对行为识别与检测等下游任务有指导性作用。当前人体姿态估计方法在深度学习的加持下取得了突破性进展,然而在人群密集和人体姿态变化大的复杂运动场景中,现有方法难以应对目标尺度变化大、遮挡及细节丢失等新的挑战。针对这些问题,提出了一种融合离散小波变换(DWT) 与高分辨率网络(HRNet) 的改进架构EFW-HRNet。引入基于 DWT 的下采样与特征融合模块,以捕捉和保留多尺度细节;设计频带间交叉注意力模块(CBA),实现 DWT 子带特征的自适应交互,提升对遮挡的鲁棒性;应用频带通道压缩(FBCC) 策略来压缩高频通道,显著降低计算冗余提升模型效率。在 COCO 数据集上的实验结果表明,EFW-HRNet 相较于强基线 UDP HRNet-W32 在AP上获得了4.0个百分点的大幅提升。消融实验验证了 DWT 模块、CBA 及 FBCC 策略的有效性,其中 FBCC 在精度与效率间实现了良好平衡,仅牺牲约0.8个百分点的AP,就换来了参数量约66%和计算量约51%的大幅削减。

关键词: 人体姿态估计, 高分辨率网络, 离散小波变换, 频带通道压缩, 注意力机制

Abstract: HPE(Human Pose Estimation) is a fundamental task in computer vision,aiming to accurately localize human keypoints and understand body structure,which is crucial for downstream tasks such as action recognition and detection.Although deep learning has driven significant progress in HPE,existing methods still struggle to effectively handle challenges like large scale variations,occlusion,and loss of details in complex scenarios such as dense crowds and dynamic movements with large pose changes.To address these issues,this paper proposes an improved architecture,EFW-HRNet,which fuses DWT(Discrete Wavelet Transform) with the HRNet(High-Resolution Network).It introduces DWT-based downsampling and feature fusion modules to capture and preserve multi-scale details.It designs a CBA(Cross Band Attention) module to enable adaptive interaction among DWT sub-band features and enhance robustness against occlusion.And it applies a FBCC(Frequency Band Channel Compression) strategy to compress high-frequency channels,significantly reducing computational redundancy and improving model efficiency.Experiments on the COCO dataset show that EFW-HRNet achieves a significant AP increase of 4.0 percentage points compared to the strong baseline UDP HRNet-W32.Ablation studies validate the effectiveness of the DWT,CBA,and FBCC strategies,where FBCC achieves a good trade-off between accuracy and efficiency,sacrificing only about 0.8 percentage points AP in exchange for a substantial reduction in parameters by about 66% and computational cost by about 51%.

Key words: Human pose estimation, High-resolution network, Discrete wavelet transform, Frequency band channel compression, Attention mechanism

中图分类号: 

  • TP391
[1]SUN K,XIAO B,LIU D,et al.Deep High-Resolution Represen-tation Learning for Human Pose Estimation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019.
[2]JIANG J,XIA N,ZHOU S.A Multi-Type Feature Fusion Network Based on Importance Weighting for Occluded Human Pose Estimation[J].IEEE/CAA Journal of Automatica Sinica,2025,12(4):789-805.
[3]WILLIAMS T,LI R.Wavelet pooling for convolutional neural networks[C]//International Conference on Learning Representations.2018.
[4]LIU P,ZHANG H,ZHANG K,et al.Multi-level Wavelet-CNN for Image Restoration[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2018.
[5]LIU M,XIANG D,CHENG X,et al.Disentangling Imperfect:AWavelet-Infused Multilevel Heterogeneous Network for Human Activity Recognition in Flawed Wearable Sensor Data[J].ar-Xiv:2402.09434,2024.
[6]LIU J,WANG J,ZHANG P,et al.Multi-scale wavelet trans-former for face forgery detection[C]//Proceedings of the Asian Conference on Computer Vision.2022:1858-1874.
[7]CHEN C L,YAO H L,JIAN B L.Discrete Wavelet Transform Sampling for Image Super Resolution[J].Applied Artificial Intelligence,2025,39(1):2449296.
[8]LU X,LI Y,CHEN X,et al.Discrete wavelet transform assisted convolutional neural network equalizer for PAM VLC system[J].Optics Express,2024,32(6):10429-10443.
[9]MAGISTRIS G D,ROMANO M,STARCZEWSKI J T,et al.A Novel DWT-based Encoder for Human Pose Estimation[EB/OL].https://ceur-ws.org/Vol-3360/p05.pdf.
[10]CHEN Y,WANG Z,PENG Y,et al.Cascaded Pyramid Network for Multi-Person Pose Estimation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2018.
[11]CAO Z,HIDALGO G,SIMON T,et al.Openpose:Realtimemulti-person 2d pose estimation using part affinity fields[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,43(1):172-186.
[12]NEWELL A,YANG K,DENG J.Stacked Hourglass Networks for Human Pose Estimation[C]//ECCV.2026:483-499.
[13]LI R,YAN A,YANG S,et al.Human Pose Estimation Based on Efficient and Lightweight High-Resolution Network[J].Sensors,2024,24(2):396.
[14]JI X,NIU Y.A Lightweight Network for Human Pose Estimation Based on ECA Attention Mechanism[J].Electronics,2023,13(1):150.
[15]WANG J,SUN K,CHENG T,et al.Deep High-Resolution Representation Learning for Visual Recognition[J].Institute of Electrical and Electronics Engineers,2021,43(10):3349-3364.
[16]YU C,XIAO B,GAO C,et al.Lite-HRNet:A LightweightHigh-Resolution Network[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021.
[17]LI Q,ZHANG Z,XIAO F,et al.Dite-HRNet:Dynamic Lightweight High-Resolution Network for Human Pose Estimation[J].arXiv:2204.10762,2022.
[18]LUO J,HAN P,QIU J,et al.An improved human pose estimation model based on DEKR[C]//International Conference on Computer Graphics,Artificial Intelligence,and Data Processing(ICCAID 2023).2024:705-712.
[19]LUO W,XUE J.Human Pose Estimation Based on Improved HRNet Model[C]//2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence(CCAI).IEEE,2023:153-157.
[20]SUN R,LIN Z,LENG S,et al.An In-Depth Analysis of 2D and 3D Pose Estimation Techniques in Deep Learning:Methodologies and Advances[J].Electronics,2025,14(7):1307.
[21]HUANG J,ZHU Z,GUO F,et al.The Devil is in the Details:Delving into Unbiased Data Processing for Human Pose Estimation[J].arXiv:1911.07524,2019.
[22]XIAO B,WU H,WEI Y.Simple Baselines for Human Pose Estimation and Tracking[J].arXiv:1804.06208,2018.
[23]GAO X,QIU T,ZHANG X,et al.Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring[J].arXiv:2401.00027,2023.
[24]TAYEB S,MEDIANI H,MEKOUAR S,et al.Advancing human action recognition:wavelet-DTW enhanced deep learning with multi-head attention[J].International Journal of Innovative Computing and Applications,2025,15(2):102-117.
[25]PACHECO J,BENITEZ V H,PEREZ G,et al.Wavelet-based computational intelligence for real-time anomaly detection and fault isolation in embedded systems[J].Machines,2024,12(9):664.
[26]ZHAO Q,ZHENG C,LIU M,et al.Poseformerv2:Exploring frequency domain for efficient and robust 3d human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:8877-8886.
[27]TANG Z,HAO Y,LI J,et al.FTCM:Frequency-temporal collaborative module for efficient 3D human pose estimation in vi-deo[J].IEEE Transactions on Circuits and Systems for Video Technology,2023,34(2):911-923.
[28]YUAN Y,FU R,HUANG L,et al.Hrformer:High-resolution transformer for dense prediction[J].arXiv:2110.09408,2021.
[29]YANG S,QUAN Z,NIE M,et al.Transpose:Keypoint localization via transformer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:11802-11812.
[30]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:Common Objects in Context[C]//ECCV 2014.2014.
[31]ANDRILUKA M,PISHCHULIN L,GEHLER P,et al.2D Human Pose Estimation:New Benchmark and State of the Art Analysis[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition.2014.
[32]Contributors M M P.OpenMMLab pose estimation toolbox and benchmark[EB/OL].https://github.com/open-mmlab/mmpose.
[33]ZHANG F,ZHU X,DAI H,et al.Distribution-Aware Coordi-nate Representation for Human Pose Estimation[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2020.
[34]WANG C,ZHANG F,ZHU X,et al.Low-resolution humanpose estimation[J].Pattern Recognition,2022,126:108579.
[35]ZHANG Z,ZHANG Y,ZHANG Y,et al.Vital information is only worth one thumbnail:Towards efficient human pose estimation[J].Pattern Recognition,2024,147(C):110111.
[36]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022.
[37]XU Y,ZHANG J,ZHANG Q,et al.Vitpose:Simple visiontransformer baselines for human pose estimation[J].Advances in Neural Information Processing Systems,2022,35:38571-38584.
[38]LI Y,ZHANG S,WANG Z,et al.TokenPose:Learning Keypoint Tokens for Human Pose Estimation[J].arXiv:2104.03516,2021.
[39]YAN Z X,BAI L,LI T S.Lightweight Human Pose Estimation Based on Self-knowledge Distillation and Convolution Compression[J].Journal of Chinese Computer Systems,2024,45(2):461-469.
[40]GUAN X,ZHOU Z J,LI Q.Human pose estimation based on graph structure guidance and location information enhancement[J].Journal of Jilin University(Engineering and Technology Edition),2025,55(10):3283-3295.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!