Computer Science ›› 2025, Vol. 52 ›› Issue (10): 106-114.doi: 10.11896/jsjkx.240800108

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Spatial-Temporal Joint Mapping for Skeleton-based Action Recognition

ZHAO Chen, PENG Jian, HUANG Junhao   

  1. College of Computer Science,Sichuan University,Chengdu 610065,China
  • Received:2024-08-21 Revised:2024-11-24 Online:2025-10-15 Published:2025-10-14
  • About author:ZHAO Chen,born in 1999,postgra-duate.His main research interest is skeleton-based human action recognition.
    PENG Jian,born in 1970,Ph.D,professor,Ph.D supervisor.His main research interests include artificial intelligence,Internet of Things technology and big data.
  • Supported by:
    Sichuan Science and Technology Program(2023YFG0115,2023YFG0112),Sichuan Industrial Development Fund Industry Foundation Task Project(2023JB06,2023JB03) and Cooperative Program of Sichuan University and Zigong(2022CDZG-6).

Abstract: In recent years,skeleton-based motion recognition tasks have received extensive attention from researchers and have made great progress in research.As powerful and effective model paradigms,graph convolutional networks and convolutional neural networks are also favored by researchers in the field of skeleton action recognition.However,1)most GCN-based methods use the paradigm of modeling spatial features and temporal features alternately,which hinders the direct communication of spatial-temporal information;2)For CNN-based methods,they effectively model spatial-temporal information.However,compared with GCN-based methods,they do not make good use of spatial information.In order to solve the above problems,this paper proposes a novel method called Spatial-Temporal Joint Mapping(STJM).The proposed method not only combines the topological information of the graph in GCN-based methods,but also uses CNN-based methods to aggregate spatial-temporal information simulta-neously.Compared with the traditional GCN method, the STJM maps the nodes in high dimension and has stronger ideographic ability.After high-dimensional mapping of nodes,only a simple τ×K convolution kernel is needed to aggregate both temporal and spatial features.As a novel spatial-temporal information aggregation module,many GCN-based topology enhancement strategies can be applied to STJM block.Compared with the previous spatial-temporal simultaneous aggregation model,the proposed me-thod has better performance.Experiments show that combining the proposed STJM Block as a plug-and-play module with GCN exceeds the previous state-of-the-art models on two large-scale datasets:NTU RGB+D 60 and NTU RGB+D 120.

Key words: GCN,CNN,Action recognition,Spatial-Temporal modeling,Skeleton sequence

CLC Number: 

  • TP183
[1]REN B,LIU M,DING R,et al.A survey on 3d skeleton-based action recognition using learning method[J].arXiv:2002.05907,2020.
[2]ZHANG Z.Microsoft kinect sensor and its effect[J].IEEE Mul-timedia,2012,19(2):4-10.
[3]CHU X,YANG W,OUYANG W,et al.Multi-context attention for human pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1831-1840.
[4]YANG W,OUYANG W,LI H,et al.End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:3073-3082.
[5]CAO Z,SIMON T,WEI S E,et al.Realtime multi-person 2dpose estimation using part affinity fields[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7291-7299.
[6]YAN S,XIONG Y,LIN D.Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2018.
[7]SHI L,ZHANG Y,CHENG J,et al.Two-stream adaptive graphconvolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:12026-12035.
[8]LIU Z,ZHANG H,CHEN Z,et al.Disentangling and unifying graph convolutions for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:143-152.
[9]CHEN Y,ZHANG Z,YUAN C,et al.Channel-wise topology refinement graph convolution for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF International Confe-rence on Computer Vision.2021:13359-13368.
[10]GEDAMU K,JI Y,GAO L L,et al.Relation-mining self-attention network for skeleton-basedhuman action recognition[J].Pattern Recognition,2023,139:109455.
[11]LI C,ZHONG Q,XIE D,et al.Co-occurrence feature learningfrom skeleton data for action recognition and detection with hierarchical aggregation[J].arXiv:1804.06055,2018.
[12]XU K,YE F,ZHONG Q,et al.Topology-aware convolutionalneural network for efficient skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022:2866-2874.
[13]LI C,XIE C,ZHANG B,et al.Memory attention networks for skeleton-based action recognition[J].IEEE Transactions on Neural Networks and Learning Systems,2021,33(9):4800-4814.
[14]THAKKAR K,NARAYANAN P J.Part-based graph convolutional network for action recognition[J].arXiv:1809.04983,2018.
[15]PENG W,HONG X,CHEN H,et al.Learning graph convolutional network for skeleton-based human action recognition by neural searching[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:2669-2676.
[16]SONG Y F,ZHANG Z,SHAN C,et al.Stronger,faster andmore explainable:A graph convolutional baseline for skeleton-based action recognition[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:1625-1633.
[17]SHAHROUDY A,LIU J,NG T T,et al.NTU RGB+D:A large scale dataset for 3d human activity analysis[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:1010-1019.
[18]LIU J,SHAHROUDY A,PEREZ M,et al.NTU RGB+D 120:A large-scale benchmark for 3d human activity understanding[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,42(10):2684-2701.
[19]ZHANG P,XUE J,LAN C,et al.Adding attentiveness to the neurons in recurrent neural networks[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:135-151.
[20]WANG H,WANG L.Modeling temporal dynamics and spatialconfigurations of actions using two-stream recurrent neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:499-508.
[21]SI C,CHEN W,WANG W,et al.An attention enhanced graph convolutional lstm network for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:1227-1236.
[22]ZHAO R,ALI H,VAN DER SMAGT P.Two-stream RNN/CNN for action recognition in 3D videos[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).IEEE,2017:4260-4267.
[23]LI W,WEN L,CHANG M C,et al.Adaptive RNN tree for large-scale human action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:1444-1452.
[24]YE F,PU S,ZHONG Q,et al.Dynamic gcn:Context-enriched topology learning for skeleton-based action recognition[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:55-63.
[25]LI M,CHEN S,CHEN X,et al.Actional-structural graph con-volutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3595-3603.
[26]PAN L,LU J,TANG X.Spatial-temporal graph neural ODEnetworks for skeleton-based action recognition[J].Scientific Reports,2024,14(1):7629.
[27]SALVADOR S,CHAN P.Toward accurate dynamic time warping in linear time and space[J].Intelligent Data Analysis,2007,11(5):561-580.
[28]CHEN Z,LI S,YANG B,et al.Multi-scale spatial temporalgraph convolutional network for skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:1113-1122.
[29]LU J,HUANG T T,ZHAO B,et al.Dual Excitation Spatial-temporal Graph Convolution Network for Skeleton-Based Action Recognition[J].IEEE Sensors Journal,2024,24(6):8184-8196.
[30]CAO Y,XIA Y,GAO Q Y,et al.Skeleton-based action recognition based on hyper-connected graph convolutional network[J].Journal of Jilin University(Engineering and Technology Edition),2025,55(2):731-740.
[31]DU Y,FU Y,WANG L.Skeleton based action recognition with convolutional neural network[C]//2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).IEEE,2015:579-583.
[32]KIM T S,REITER A.Interpretable 3d human action analysis with temporal convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2017:20-28.
[33]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141.
[34]KIPF T N,WELLING M.Semi-supervised classification withgraph convolutional networks[J].arXiv:1609.02907,2016.
[35]LECUN Y,BOSER B,DENKER J,et al.Handwritten digit re-cognition with a back-propagation network[C]//Proceedings of the 3rd International Conference on Neural Information Proces-sing Systems.1989:396-404.
[36]HOWARD A G,ZHU M,CHEN B,et al.Mobilenets:Efficient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017.
[37]SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2818-2826.
[38]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[39]ZHANG P,LAN C,ZENG W,et al.Semantics-guided neuralnetworks for efficient skeleton-based human action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:1112-1121.
[40]VERMA V,LAMB A,BECKHAM C,et al.Manifold mixup:Better representations by interpolating hidden states[C]//International Conference on Machine Learning.PMLR,2019:6438-6447.
[41]SHI L,ZHANG Y,CHENG J,et al.Skeleton-based action recognition with multi-stream adaptive graph convolutional networks[J].IEEE Transactions on Image Processing,2020,29:9532-9545.
[42]WU L,ZHANG C,ZOU Y.SpatioTemporal focus for skeleton-based action recognition[J].Pattern Recognition,2023,136:109231.
[43]SHI L,ZHANG Y,CHENG J,et al.Adasgn:Adapting jointnumber and model size for efficient skeleton-based action recognition[C]//Proceedings of the IEEE/CVF International Confe-rence on Computer Vision.2021:13413-13422.
[44]CHENG K,ZHANG Y,HE X,et al.Skeleton-based action recognition with shift graph convolutional network[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:183-192.
[45]YANG D,WANG Y,DANTCHEVA A,et al.Unik:A unified framework for real-world skeleton-based action recognition[J].arXiv:2107.08580,2021.
[46]KANG M S,KANG D,KIM H S.Efficient skeleton-based action recognition via joint-mapping strategies[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2023:3403-3412.
[47]GEDAMU K,JI Y,GAO L L,et al.Relation-mining self-attention network for skeleton-based human action recognition[J].Pattern Recognition,2023,139:109455.
[48]YANG W,ZHANG J,CAI J,et al.HybridNet:Integrating GCNand CNN for skeleton-based action recognition[J].Applied Intelligence,2023,53(1):574-585.
[49]BAVIL A F,DAMIRCHI H,TAGHIRAD H D.Action Capsules:Human skeleton action recognition[J].Computer Vision and Image Understanding,2023,233:103722.
[1] WANG Yongxin, XU Xin, ZHU Hongbin. Survey of Tabular Data Generation Techniques [J]. Computer Science, 2025, 52(10): 3-12.
[2] LI Ao, BAI Xueru, JIANG Jiali, QIAO Ye. Group Cross Adversarial Application in Stock Price Prediction [J]. Computer Science, 2025, 52(10): 22-32.
[3] LIU Yuting, GU Jingjing, ZHOU Qiang. Urban Flow Prediction Method Based on Structural Causal Model [J]. Computer Science, 2025, 52(10): 70-78.
[4] LEI Ershuai, YU Suping, FAN Hong, XU Wujun. Spatial-Temporal Propagation Graph Neural Network for Traffic Prediction [J]. Computer Science, 2025, 52(10): 90-97.
[5] WANG Liuyi, ZHOU Chun, ZENG Wenqiang, HE Xingxing, MENG Hua. High-frequency Feature Masking-based Adversarial Attack Algorithm [J]. Computer Science, 2025, 52(10): 374-381.
[6] LI Siqi, YU Kun, CHEN Yuhao. Prediction of Resource Usage on High-performance Computing Platforms Based on ARIMAand LSTM [J]. Computer Science, 2025, 52(9): 178-185.
[7] WANG Limei, HAN Linrui, DU Zuwei, ZHENG Ri, SHI Jianzhong, LIU Yiqun. Privacy Policy Compliance Detection Method for Mobile Application Based on Large LanguageModel [J]. Computer Science, 2025, 52(8): 1-16.
[8] GUO Husheng, ZHANG Xufei, SUN Yujie, WANG Wenjian. Continuously Evolution Streaming Graph Neural Network [J]. Computer Science, 2025, 52(8): 118-126.
[9] YU Shihai, HU Bin. Bio-inspired Neural Network with Visual Invariant Response to Moving Pedestrian [J]. Computer Science, 2025, 52(7): 170-188.
[10] LI Bo, MO Xian. Application of Large Language Models in Recommendation System [J]. Computer Science, 2025, 52(6A): 240400097-7.
[11] SHI Xincheng, WANG Baohui, YU Litao, DU Hui. Study on Segmentation Algorithm of Lower Limb Bone Anatomical Structure Based on 3D CTImages [J]. Computer Science, 2025, 52(6A): 240500119-9.
[12] CHEN Shijia, YE Jianyuan, GONG Xuan, ZENG Kang, NI Pengcheng. Aircraft Landing Gear Safety Pin Detection Algorithm Based on Improved YOlOv5s [J]. Computer Science, 2025, 52(6A): 240400189-7.
[13] LIU Bingzhi, CAO Yin, ZHOU Yi. Distillation Method for Text-to-Audio Generation Based on Balanced SNR-aware [J]. Computer Science, 2025, 52(6A): 240900125-5.
[14] ZHANG Hang, WEI Shoulin, YIN Jibin. TalentDepth:A Monocular Depth Estimation Model for Complex Weather Scenarios Based onMultiscale Attention Mechanism [J]. Computer Science, 2025, 52(6A): 240900126-7.
[15] CHENG Yan, HE Huijuan, CHEN Yanying, YAO Nannan, LIN Guobo. Study on interpretable Shallow Class Activation Mapping Algorithm Based on Spatial Weights andInter Layer Correlation [J]. Computer Science, 2025, 52(6A): 240500140-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!