联合小波分析与频域注意力的高精度人体姿态估计

doi:10.11896/jsjkx.250800025

Abstract

Abstract: HPE(Human Pose Estimation) is a fundamental task in computer vision,aiming to accurately localize human keypoints and understand body structure,which is crucial for downstream tasks such as action recognition and detection.Although deep learning has driven significant progress in HPE,existing methods still struggle to effectively handle challenges like large scale variations,occlusion,and loss of details in complex scenarios such as dense crowds and dynamic movements with large pose changes.To address these issues,this paper proposes an improved architecture,EFW-HRNet,which fuses DWT(Discrete Wavelet Transform) with the HRNet(High-Resolution Network).It introduces DWT-based downsampling and feature fusion modules to capture and preserve multi-scale details.It designs a CBA(Cross Band Attention) module to enable adaptive interaction among DWT sub-band features and enhance robustness against occlusion.And it applies a FBCC(Frequency Band Channel Compression) strategy to compress high-frequency channels,significantly reducing computational redundancy and improving model efficiency.Experiments on the COCO dataset show that EFW-HRNet achieves a significant AP increase of 4.0 percentage points compared to the strong baseline UDP HRNet-W32.Ablation studies validate the effectiveness of the DWT,CBA,and FBCC strategies,where FBCC achieves a good trade-off between accuracy and efficiency,sacrificing only about 0.8 percentage points AP in exchange for a substantial reduction in parameters by about 66% and computational cost by about 51%.

Key words: Human pose estimation, High-resolution network, Discrete wavelet transform, Frequency band channel compression, Attention mechanism

CLC Number:

TP391

LI Zongmin, WANG Li, LI Yachuan, LIU Yujie, RONG Guangcai, LIU Weihan, MA Wenkang. High-accuracy Human Pose Estimation Combining Wavelet Analysis and Frequency-DomainAttention[J].Computer Science, 2026, 53(5): 228-236.

References

[1]SUN K,XIAO B,LIU D,et al.Deep High-Resolution Represen-tation Learning for Human Pose Estimation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019.
[2]JIANG J,XIA N,ZHOU S.A Multi-Type Feature Fusion Network Based on Importance Weighting for Occluded Human Pose Estimation[J].IEEE/CAA Journal of Automatica Sinica,2025,12(4):789-805.
[3]WILLIAMS T,LI R.Wavelet pooling for convolutional neural networks[C]//International Conference on Learning Representations.2018.
[4]LIU P,ZHANG H,ZHANG K,et al.Multi-level Wavelet-CNN for Image Restoration[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2018.
[5]LIU M,XIANG D,CHENG X,et al.Disentangling Imperfect:AWavelet-Infused Multilevel Heterogeneous Network for Human Activity Recognition in Flawed Wearable Sensor Data[J].ar-Xiv:2402.09434,2024.
[6]LIU J,WANG J,ZHANG P,et al.Multi-scale wavelet trans-former for face forgery detection[C]//Proceedings of the Asian Conference on Computer Vision.2022:1858-1874.
[7]CHEN C L,YAO H L,JIAN B L.Discrete Wavelet Transform Sampling for Image Super Resolution[J].Applied Artificial Intelligence,2025,39(1):2449296.
[8]LU X,LI Y,CHEN X,et al.Discrete wavelet transform assisted convolutional neural network equalizer for PAM VLC system[J].Optics Express,2024,32(6):10429-10443.
[9]MAGISTRIS G D,ROMANO M,STARCZEWSKI J T,et al.A Novel DWT-based Encoder for Human Pose Estimation[EB/OL].https://ceur-ws.org/Vol-3360/p05.pdf.
[10]CHEN Y,WANG Z,PENG Y,et al.Cascaded Pyramid Network for Multi-Person Pose Estimation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2018.
[11]CAO Z,HIDALGO G,SIMON T,et al.Openpose:Realtimemulti-person 2d pose estimation using part affinity fields[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,43(1):172-186.
[12]NEWELL A,YANG K,DENG J.Stacked Hourglass Networks for Human Pose Estimation[C]//ECCV.2026:483-499.
[13]LI R,YAN A,YANG S,et al.Human Pose Estimation Based on Efficient and Lightweight High-Resolution Network[J].Sensors,2024,24(2):396.
[14]JI X,NIU Y.A Lightweight Network for Human Pose Estimation Based on ECA Attention Mechanism[J].Electronics,2023,13(1):150.
[15]WANG J,SUN K,CHENG T,et al.Deep High-Resolution Representation Learning for Visual Recognition[J].Institute of Electrical and Electronics Engineers,2021,43(10):3349-3364.
[16]YU C,XIAO B,GAO C,et al.Lite-HRNet:A LightweightHigh-Resolution Network[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021.
[17]LI Q,ZHANG Z,XIAO F,et al.Dite-HRNet:Dynamic Lightweight High-Resolution Network for Human Pose Estimation[J].arXiv:2204.10762,2022.
[18]LUO J,HAN P,QIU J,et al.An improved human pose estimation model based on DEKR[C]//International Conference on Computer Graphics,Artificial Intelligence,and Data Processing(ICCAID 2023).2024:705-712.
[19]LUO W,XUE J.Human Pose Estimation Based on Improved HRNet Model[C]//2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence(CCAI).IEEE,2023:153-157.
[20]SUN R,LIN Z,LENG S,et al.An In-Depth Analysis of 2D and 3D Pose Estimation Techniques in Deep Learning:Methodologies and Advances[J].Electronics,2025,14(7):1307.
[21]HUANG J,ZHU Z,GUO F,et al.The Devil is in the Details:Delving into Unbiased Data Processing for Human Pose Estimation[J].arXiv:1911.07524,2019.
[22]XIAO B,WU H,WEI Y.Simple Baselines for Human Pose Estimation and Tracking[J].arXiv:1804.06208,2018.
[23]GAO X,QIU T,ZHANG X,et al.Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring[J].arXiv:2401.00027,2023.
[24]TAYEB S,MEDIANI H,MEKOUAR S,et al.Advancing human action recognition:wavelet-DTW enhanced deep learning with multi-head attention[J].International Journal of Innovative Computing and Applications,2025,15(2):102-117.
[25]PACHECO J,BENITEZ V H,PEREZ G,et al.Wavelet-based computational intelligence for real-time anomaly detection and fault isolation in embedded systems[J].Machines,2024,12(9):664.
[26]ZHAO Q,ZHENG C,LIU M,et al.Poseformerv2:Exploring frequency domain for efficient and robust 3d human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:8877-8886.
[27]TANG Z,HAO Y,LI J,et al.FTCM:Frequency-temporal collaborative module for efficient 3D human pose estimation in vi-deo[J].IEEE Transactions on Circuits and Systems for Video Technology,2023,34(2):911-923.
[28]YUAN Y,FU R,HUANG L,et al.Hrformer:High-resolution transformer for dense prediction[J].arXiv:2110.09408,2021.
[29]YANG S,QUAN Z,NIE M,et al.Transpose:Keypoint localization via transformer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:11802-11812.
[30]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:Common Objects in Context[C]//ECCV 2014.2014.
[31]ANDRILUKA M,PISHCHULIN L,GEHLER P,et al.2D Human Pose Estimation:New Benchmark and State of the Art Analysis[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition.2014.
[32]Contributors M M P.OpenMMLab pose estimation toolbox and benchmark[EB/OL].https://github.com/open-mmlab/mmpose.
[33]ZHANG F,ZHU X,DAI H,et al.Distribution-Aware Coordi-nate Representation for Human Pose Estimation[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2020.
[34]WANG C,ZHANG F,ZHU X,et al.Low-resolution humanpose estimation[J].Pattern Recognition,2022,126:108579.
[35]ZHANG Z,ZHANG Y,ZHANG Y,et al.Vital information is only worth one thumbnail:Towards efficient human pose estimation[J].Pattern Recognition,2024,147(C):110111.
[36]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022.
[37]XU Y,ZHANG J,ZHANG Q,et al.Vitpose:Simple visiontransformer baselines for human pose estimation[J].Advances in Neural Information Processing Systems,2022,35:38571-38584.
[38]LI Y,ZHANG S,WANG Z,et al.TokenPose:Learning Keypoint Tokens for Human Pose Estimation[J].arXiv:2104.03516,2021.
[39]YAN Z X,BAI L,LI T S.Lightweight Human Pose Estimation Based on Self-knowledge Distillation and Convolution Compression[J].Journal of Chinese Computer Systems,2024,45(2):461-469.
[40]GUAN X,ZHOU Z J,LI Q.Human pose estimation based on graph structure guidance and location information enhancement[J].Journal of Jilin University(Engineering and Technology Edition),2025,55(10):3283-3295.

Related Articles 15

[1]	CHEN Boying, SHI Jie. Continuous Image Super-resolution Based on Self-attention Implicit Feature Encoding andDecoding [J]. Computer Science, 2026, 53(5): 237-246.
[2]	LIU Dehua, YU Saixuan, QIAO Jinlan, HUANG Heqing, CHENG Wenhui. Denoising Diffusion Model-enhanced Algorithm for Battery Swap Demand Data Generation [J]. Computer Science, 2026, 53(4): 163-172.
[3]	PENG Juhong, ZHANG Zhengyue, DING Zixu, FAN Xinyu, HU Changyu, ZHAO Mingjun. Multi-view Local Language Feature and Global Feature Fusion for Conversational Aspect-based Sentiment Quadruple Analysis [J]. Computer Science, 2026, 53(4): 384-392.
[4]	ZHENG Cheng, BAN Qingqing. Knowledge-assisted and Reinforced Syntax-driven for Aspect-based Sentiment Analysis [J]. Computer Science, 2026, 53(4): 406-414.
[5]	WANG Xinyu, GAO Donghuai, NING Yuwen, XU Hao, QI Haonan. Student Behavior Detection Method Based on Improved YOLO Algorithm [J]. Computer Science, 2026, 53(3): 246-256.
[6]	QIAN Qing, CHEN Huicheng, CUI Yunhe, TANG Ruixue, FU Jinmei. Joint Entity and Relation Extraction Method with Multi-scale Collaborative Aggregation and Axial-semantic Guidance [J]. Computer Science, 2026, 53(3): 97-106.
[7]	GE Zeqing, HUANG Shengjun. Semi-supervised Learning Method for Multi-label Tabular Data [J]. Computer Science, 2026, 53(3): 151-157.
[8]	CHANG Xuanwei, DUAN Liguo, CHEN Jiahao, CUI Juanjuan, LI Aiping. Method for Span-level Sentiment Triplet Extraction by Deeply Integrating Syntactic and Semantic Features [J]. Computer Science, 2026, 53(2): 322-330.
[9]	ZHANG Jing, PAN Jinghao, JIANG Wenchao. Background Structure-aware Few-shot Knowledge Graph Completion [J]. Computer Science, 2026, 53(2): 331-341.
[10]	ZHUO Tienong, YING Di, ZHAO Hui. Research on Student Classroom Concentration Integrating Cross-modal Attention and Role Interaction [J]. Computer Science, 2026, 53(2): 67-77.
[11]	XU Jingtao, YANG Yan, JIANG Yongquan. Time-Frequency Attention Based Model for Time Series Anomaly Detection [J]. Computer Science, 2026, 53(2): 161-169.
[12]	HAN Lei, SHANG Haoyu, QIAN Xiaoyan, GU Yan, LIU Qingsong, WANG Chuang. Constrained Multi-loss Video Anomaly Detection with Dual-branch Feature Fusion [J]. Computer Science, 2026, 53(2): 236-244.
[13]	GUO Xingxing, XIAO Yannan, WEN Peizhi, XU Zhi, HUANG Wenming. Attention-based Audio-driven Digital Face Video Generation Method [J]. Computer Science, 2026, 53(2): 245-252.
[14]	JI Sai, QIAO Liwei, SUN Yajie. Semantic-guided Hybrid Cross-feature Fusion Method for Infrared and Visible Light Images [J]. Computer Science, 2026, 53(2): 253-263.
[15]	LYU Jinggang, GAO Shuo, LI Yuzhi, ZHOU Jin. Facial Expression Recognition with Channel Attention Guided Global-Local Semantic Cooperation [J]. Computer Science, 2026, 53(1): 195-205.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

High-accuracy Human Pose Estimation Combining Wavelet Analysis and Frequency-DomainAttention

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0