基于超图卷积和多角度拓扑细化的骨骼行为识别方法

doi:10.11896/jsjkx.240600125

摘要/Abstract

摘要： 由于人体骨架是一个天然存在的拓扑结构,因此图卷积网络(GCNs)被广泛地应用于基于骨骼的人体行为识别。然而,目前的基于GCN的方法只关注关节点对之间的低阶关系,而忽略了潜在的关节点在关节点群中的高阶关系。同时,现有的方法忽略了空间拓扑随时间的动态变化。这些不足影响了模型的表现。为此,利用K-NN计算出相关性高的关节点构成超边,提出了超图构建方法和超边图卷积来动态地学习关节点间的高阶关系。此外,设计了一个从时间和通道角度细化的拓扑图来学习帧级的和通道级的关节点对之间的相关性。最后,开发了一个多角度拓扑细化的超图卷积网络(HyperMTR-GCN)用于骨骼行为识别,其在NTU RGB+D和NTU RGB+D 120数据集上具有显著优势。具体地,所提方法在NTU RGB+D的X-sub基准上比2s-AGCN提高了3.7%,在NTU RGB+D 120的X-sub基准上比2s-AGCN提高了5.7%。

关键词: 行为识别, 图卷积网络, 超图神经网络, 骨架建模, 拓扑细化

Abstract: Since the human skeleton is a natural topological structure,graph convolutional networks(GCNs) are widely used for skeleton-based human action recognition.In recent research,skeleton sequences are represented as spatio-temporal graphs and topology graphs are used to model the correlation between human joints.However,current GCN-based methods only focus on pairwise joint relationships and ignore potential high-order relationships beyond pairwise relationships,leading to underutilization of the graph structure of skeleton data.To solve this problem,this paper proposes the concept of hypergraph to represent potential high-order relationships of joints.Since the high-order relationships of joints within each frame in the skeleton sequence may vary,the model dynamically learns the high-order correlations within each frame with the K-NN method and initialize the hypergraph structure using the high-level representation of joints.This hypergraph structure can better learn the high-order relationships between joints as the hyperedges dynamically adjust with the evolution of joint features.In current hypergraph neural networks,hypergraph convolution transforms the hypergraph into a simple graph using the Laplace's transformation and then performs graph convolution.This method does not fully utilize the characteristics of the hypergraph.The proposed hypergraph convolution method better utilizes the relationship between hyperedges and hypernodes in the hypergraph,performing hyperedge graph convolution on each hyperedge to learn the high-order relationships between joints.The second problem with current GCN-based human action recognition methods is that the topology built by GCNs to represent pairwise joint relationships is not dynamic enough,such as using the same topology for all frames in a sample.To fully explore the dynamic correlation between pairwise joints,the frame-wise topology modeling method is proposed to capture correlation between pairwise joints under different frames and channel-level topology modeling method is proposed to capture correlation between different feature types.Finally,a hypergraph convolution network with multi-perspective topology refinement(HyperMTR-GCN) is developedfor skeleton-based action recognition,which has a significant advantage on the NTU RGB+D and NTU RGB+D 120 datasets.Specifically,it improves by 3.7% on the X-sub benchmark of NTU RGB+D and by 5.7% on the X-sub benchmark of NTU RGB+D 120 compared to 2s-AGCN.

Key words: Action recognition, Graph convolutional network, Hypergraph neural network, Skeleton modeling, Topology refinement

中图分类号:

TP391.41

黄倩, 苏新凯, 李畅, 巫义锐. 基于超图卷积和多角度拓扑细化的骨骼行为识别方法[J]. 计算机科学, 2025, 52(5): 220-226. https://doi.org/10.11896/jsjkx.240600125

HUANG Qian, SU Xinkai, LI Chang, WU Yirui. Hypergraph Convolutional Network with Multi-perspective Topology Refinement forSkeleton-based Action Recognition[J]. Computer Science, 2025, 52(5): 220-226. https://doi.org/10.11896/jsjkx.240600125

参考文献

[1]JIANG Y G,DAI Q,LIU W,et al.Human action recognition in unconstrained videos by explicit motion modeling [J].IEEE Transactions on Image Processing,2015,24(11):3781-3795.
[2]GAUR U,ZHU Y,SONG B,et al.A “string of feature graphs” model for recognition of complex activities in natural videos[C]//Proceedings of the 2011 International Conference on Computer Vision.Barcelona,Spain,2011:2595-2602.
[3]YAN S J,XIONG Y J,LIN D H.Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018:7444-7452.
[4]SHI L,ZHANG Y F,CHENG J,et al.Two-stream adaptivegraph convolutional networks for skeleton-based action recognition[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Long Beach,CA,USA,2019:12018-12027.
[5]YE F F,PU S L,ZHONG Q Y,et al.Dynamic gcn:Context-enriched topology learning for skeleton-based action recognition[C]//Proceedings of the 28th ACM International Conference on Multimedia.Seattle,WA,USA,2020:55-63.
[6]HAO X K,LI J,GUO Y C,et al.Hypergraph neural networkfor skeleton-based action recognition [J].IEEE Transactions on Image Processing,2021,30:2263-2275.
[7]ZHU Y,CHEN W B,GUO G D.Fusing spatiotemporal features and joints for 3D action recognition[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition Workshops.Portland,OR,USA,2013:486-491.
[8]WANG J,NIE X H,XIA Y,et al.Cross-view action modeling,learning,and recognition[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition(CVPR).2014:2649-2656.
[9]HAMMONE D K,VANDERGHEYNST P,GRIBONVAL R.Wavelets on graphs via spectral graph theory[J].Applied and Computational Harmonic Analysis,2011,30(2):129-150.
[10]TANG Y S,TIAN Y,LU J W,et al.Deep progressive reinforcement learning for skeleton-based action recognition[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City,UT,USA,2018:5323-5332.
[11]SHI L,ZHANG Y,CHENG J,et al.Skeleton-Based Action Re-cognition With Directed Graph Neural Networks[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Long Beach,CA,USA,2019:7904-7913.
[12]LI M,CHEN S H,CHEN X,et al.Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Long Beach,CA,USA,2019:3590-3598.
[13]CHENG K,ZHANG Y,HE X,et al.Skeleton-Based ActionRecognition with Shift Graph Convolutional Network[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Seattle,WA,USA,2020:180-189.
[14]CHEN Y X,ZHANG Z Q,YUAN C F,et al.Channel-wise topology refinement graph convolution for skeleton-based action recognition[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision(ICCV).Montreal,QC,Canada,2021:13339-13348.
[15]THAKKAR K,NARAYANAN P J.Part-based graph convolutional network for action recognition[C]//Proceedings of the Brit.Mach.Vis.Conf.(BMVC).2018:270-283.
[16]HUANG L,HUANG Y,OUYANG W,et al.Part-Level GraphConvolutional Network for Skeleton-Based Action Recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:11045-11052.
[17]LIU S Y,LV P,ZHANG Y Z,et al.Semi-dynamic hypergraph neural network for 3d pose estimation[C]//Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence.Yokohama,Yokohama,Japan,2021.
[18]BAI S,ZHANG F H,TORR P H S.Hypergraph convolutionand hypergraph attention[J].Pattern Recognition,2021,110(1):1-8.
[19]ZHOU Y X,LI C,CHENG Z Q,et al.Hypergraph Transformer for Skeleton-based Action Recognition [EB/OL].https://api.semanticscholar.org/CorpusID:253581243.
[20]SHAHROUDY A,LIU J,NG T T,et al.Ntu rgb+d:A large scale dataset for 3d human activity analysis[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Las Vegas,NV,USA,2016:1010-1019.
[21]LIU J,SHAHROUDY A,PEREZ M,et al.Ntu rgb+d 120:A large-scale benchmark for 3d human activity understanding [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,42(10):2684-2701.
[22]LI C,MAO Y C,HUANG Q,et al.Scale-Aware Graph Convolutional Network with Part-Level Refinement for Skeleton-Based Human Action Recognition [J].IEEE Transactions on Circuits and Systems for Video Technology,2024,34(6):4311-4324.
[23]ZHU X W,HUANG Q,LI C,et al.Skeleton-Based Action Recognition with Combined Part-Wise Topology Graph Convolutional Networks[C]//Pattern Recognition and Computer Vision(PRCV 2023).2023:43-59.
[24]ZHANG P F,LAN C L,ZENG W J,et al.Semantics-guided neural networks for efficient skeleton-based human action recognition[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Seattle,WA,USA,2020:1109-1118.
[25]LIU Z Y,ZHANG H W,CHEN Z H,et al.Disentangling andunifying graph convolutions for skeleton-based action recognition[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Seattle,WA,USA,2020:140-149.
[26]SONG Y F,ZHANG Z,SHAN C F,et al.Richly activated graph convolutional network for robust skeleton-based action recognition [J].IEEE Transactions on Circuits and Systems for Video Technology,2021,31(5):1915-1925.
[27]FENG D,WU Z C,ZHANG J,et al.Multi-scale spatial temporal graph neural network for skeleton-based action recognition [J].IEEE Access,2021,9:58256-58265.
[28]WU C,WU X J,KITTLER J.Graph2net:Perceptually-enriched graph learning for skeleton-based action recognition [J].IEEE Transactions on Circuits and Systems for Video Technology,2022,32(4):2120-2132.
[29]XU K L,YE F F,ZHONG Q Y,et al.Topology-aware convolutional neural network for efficient skeleton-based action recognition [C]//AAAI Conference on Artificial Intelligence.2021:2866-2874.
[30]WEN Y H,GAO L,FU H B,et al.Motifgcns with local andnon-local temporal blocks for skeleton-based action recognition [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(2):2009-2023.
[31]HUANG Z X,QIN Y S,LIN X,et al.Motiondriven spatial and temporal adaptive high-resolution graph convolutional networks for skeleton-based action recognition [J].IEEE Transactions on Circuits and Systems for Video Technology,2023,33(4):1868-1883.
[32]LIN L,ZHANG J,LIU J.Actionlet-Dependent ContrastiveLearning for Unsupervised Skeleton-Based Action Recognition[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Vancouver,BC,Canada,2023:2363-2372.
[33]SONG Y F,ZHANG Z,SHAN C F,et al.Constructing stronger and faster baselines for skeleton-based action recognition [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(2):1474-1488.
[34]HUA Y,WU W,ZHENG C,et al.Part Aware ContrastiveLearning for Self-Supervised Action Recognition[C]//Procee-dings of the Thirty-Second International Joint Conference on Artificial Intelligence.Macao,China,2023:855-863.
[35]ZHU Y S,HAN H,YU Z T,et al.Modeling the relative visual tempo for self-supervised skeleton-based action recognition[C]//2023 IEEE/CVF International Conference on Computer Vision(ICCV).Paris,France,2023:13867-13876.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed