Computer Science ›› 2026, Vol. 53 ›› Issue (2): 89-98.doi: 10.11896/jsjkx.250800007

• Educational Data Mining Based on Graph Machine Learning • Previous Articles     Next Articles

Multimodal Physical Education Data Fusion via Graph Alignment for Action Recognition

CHEN Haitao1, LIANG Junwei2, CHEN Chen3, WANG Yufan4, ZHOU Yu1   

  1. 1 College of Computer Science and Software Engineering,Shenzhen University,Shenzhen,Guangdong 518060,China
    2 Shenzhen Institute of Information Technology,Shenzhen,Guangdong 518172,China
    3 Faculty of Liberal Arts,Northwest University,Xi’an 710127,China
    4 School of Mechanical Engineering,Shanghai Jiao Tong University,Shanghai 200240,China
  • Received:2025-08-04 Revised:2025-11-03 Published:2026-02-10
  • About author:CHEN Haitao,born in 2001,postgra-duate.His main research interests include multimodal data fusion methods in the field of action recognition,and so on.
    ZHOU Yu,born in 1987,Ph.D,associate professor,is a member of CCF(No.P7618M).His main research interests include computational intelligence,machine learning and intelligent information processing.
  • Supported by:
    Surface Project of the National Natural Science Foundation of China(72271168),Surface Project of the Natural Science Foundation of Guangdong Province,China(2024A1515012485),Key Field Research and Development Program of Guangdong Province,China(2024B0101120003),Major Science and Technology Special Project of Shenzhen,China(KJZD20230923114111021),Surface Project of the Basic Research of Shenzhen,China(JCYJ20220810112354002) and Joint Funds of the Basic and Applied Basic Research Area of Guangdong Province,China-the Program of the Young Scientists Fund(2023A1515110070).

Abstract: In the context of intelligent sports and educational informatization,fine-grained human action recognition has become a key technology in physical education and training assessment.To address the limitations of traditional methods in utilizing multi-modal information and capturing spatio-temporal structures in complex motion scenarios,this paper proposes a multi-modal graph convolutional network model that fuses skeleton data and wearable sensor information.Firstly,it proposes a fusion method based on “virtual sensors,” which maps wearable sensor signals onto a spatio-temporal graph constructed from skeletal joints,enabling effective integration of multimodal information and enhancing fine-grained motion modeling and cross-modal semantic consistency.Secondly,it designs a multi-layer graph convolutional network tailored for complex sports movements,incorporating local body part segmentation to improve recognition performance in challenging scenarios.Thirdly,it constructs a high-quality multimodal dataset for fencing,covering various technical actions and skill levels,to support fine-grained action recognition and skill assessment.Experimental results on both this dataset and several public benchmarks demonstrate that the proposed method outperforms existing approaches in both action recognition accuracy and skill level classification.This work provides a novel mode-ling framework and technical support for intelligent recognition and evaluation in sports education.

Key words: Action recognition, Multimodal data fusion, Graph convolutional network, Physical education, Fencing dataset

CLC Number: 

  • TP391
[1]SONG Y F,ZHANG Z,SHAN C,et al.Constructing stronger and faster baselines for skeleton-based action recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(2):1474-1488.
[2]RAO D S,RAO L K,BHAGYARAJU V,et al.Enhanced Depth Motion Maps for Improved Human Action Recognition from Depth Action Sequences[J].Traitement du Signal,2024,41(3):1461-1472.
[3]LAI Y T,LIN C H,CHOU P Y.Real-Time Point Cloud Action Recognition System with Automated Point Cloud Preprocessing[C]//2024 IEEE International Conference on Consumer Electronics(ICCE).IEEE,2024:1-7.
[4]YANG Y,YANG H,LIU Z,et al.Fall detection system based on infrared array sensor and multi-dimensional feature fusion[J].Measurement,2022,192:110870.
[5]ZHU K,WONG A,MCPHEE J.Fencenet:Fine-grainedfoot-work recognition in fencing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:3589-3598.
[6]TAO W,CHEN H,MONIRUZZAMAN M,et al.Attention-Based Sensor Fusion for Human Activity Recognition Using IMU Signals[J].arXiv:2112.11224,2021.
[7]AHMAD Z,KHAN N.Towards improved human action recognition using convolutional neural networks and multimodal fusion of depth and inertial sensor data[C]//2018 IEEE International Symposium on Multimedia(ISM).IEEE,2018:223-230.
[8]AKTAS M E,AKBAS E,FATMAOUI A E.Persistence homo-logy of networks:methods and applications[J].Applied Network Science,2019,4(1):1-28.
[9]LE V T,TRAN-TRUNG K,HOANG V T.A comprehensive review of recent deep learning techniques for human activity re-cognition[J].Computational Intelligence and Neuroscience,2022,2022(1):8323962.
[10]YAN S,XIONG Y,LIN D.Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2018.
[11]LI M,CHEN S,CHEN X,et al.Actional-structural graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3595-3603.
[12]SHI L,ZHANG Y,CHENG J,et al.Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:12026-12035.
[13]HU L,LIU S,FENG W.Spatial Temporal Graph AttentionNetwork for Skeleton-Based Action Recognition[J].arXiv:2208.08599,2022.
[14]CHEN Z,LI S,YANG B,et al.Multi-Scale Spatial TemporalGraph Convolutional Network for Skeleton-Based Action Re-cognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:1113-1122.
[15]SUN Z,KE Q,RAHMANI H,et al.Human action recognition from various data modalities:A review[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(3):3200-3225.
[16]BOULAHIA S Y,AMAMRA A,MADI M R,et al.Early,intermediate and late fusion strategies for robust deep learning-based multimodal action recognition[J].Machine Vision and Applications,2021,32(6):121.
[17]CHEN T,MO L.Swin-fusion:swin-transformer with feature fusion for human action recognition[J].Neural Processing Letters,2023,55(8):11109-11130.
[18]QIU S,FAN T,JIANG J,et al.A novel two-level interactive action recognition model based on inertial data fusion[J].Information Sciences,2023,633:264-279.
[19]CHOI H,BEEDU A,ESSA I.Multimodal Contrastive Learning with Hard Negative Sampling for Human Activity Recognition[J].arXiv:2309.01262,2023.
[20]HU Z,XIAO J,LI L,et al.Human-centric multimodal fusionnetwork for robust action recognition[J].Expert Systems with Applications,2024,239:122314.
[21]YUAN Z,YANG Z,NING H,et al.Multiscale knowledge distillation with attention based fusion for robust human activity re-cognition[J].Scientific Reports,2024,14(1):12411.
[22]CHEN Z,SONG X,ZHANG Y,et al.Intelligent Recognition of Physical Education Teachers’ Behaviors Using Kinect Sensors and Machine Learning[J].Sensors & Materials,2022,34(3):1241-1253.
[23]HAN J Z,ZHAO J J,YUE Y,et al.Edge Computing-based Vi-deo Action Recognition Method and Its Application in Online Physical Education Teaching[J].IEEE Access,2024,12:148666-148676.
[24]DING X,PENG W,YI X.Evaluation of physical educationteaching effect based on action skill recognition[J].Computational Intelligence and Neuroscience,2022,2022(1):9489704.
[25]FU D,CHEN L,CHENG Z.Integration of wearable smart devices and internet of things technology into public physical education[J].Mobile Information Systems,2021,2021(1):6740987.
[26]SRI-IESARANUSORN P,GARCIA F C,TIAUSAS F,et al.Toward the perfect stroke:A multimodal approach for table tennis stroke evaluation[C]//2021 Thirteenth International Conference on Mobile Computing and Ubiquitous Network(ICMU).IEEE,2021:1-5.
[27]YUAN H,NI D,WANG M.Spatio-temporal dynamic inference network for group activity recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:7476-7485.
[28]HU L,LIU S,FENG W.Spatial temporal graph attention net-work for skeleton-based action recognition[J].arXiv:2208.08599,2022.
[29]DUHME M,MEMMESHEIMER R,PAULUS D.Fusion-gcn:Multimodal action recognition using graph convolutional networks[C]//DAGM German Conference on Pattern Recognition.Cham:Springer,2021:265-281.
[30]IJAZ M,DIAZ R,CHEN C.Multimodal transformer for nursing activity recognition[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2022:2065-2074
[31]KONG Q,WU Z,DENG Z,et al.Mmact:A large-scale dataset for cross modal human action understanding[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:8658-8667.
[32]CHAO X,HOU Z,MO Y.CZU-MHAD:a multimodal dataset for human action recognition utilizing a depth camera and 10 wearable inertial sensors[J].IEEE Sensors Journal,2022,22(7):7034-7042.
[33]CHEN C,JAFARI R,KEHTARNAVAZ N.UTD-MHAD:A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor[C]//2015 IEEE International Conference on Image Processing(ICIP).IEEE,2015:168-172.
[34]CHOI H,BEEDU A,HARESAMUDRAM H,et al.Multi-stage based feature fusion of multi-modal data for human activity re-cognition[J].arXiv:2211.04331,2022.
[35]GAO Z,WANG Y,CHEN J,et al.Mmtsa:Multi-modal temporal segment attention network for efficient human activity recognition[J].Proceedings of the ACM on Interactive,Mobile,Wearable and Ubiquitous Technologies,2023,7(3):1-26.
[36]NI J,SARBAJNA R,LIU Y,et al.Cross-modal knowledge distillation for vision-to-sensor action recognition[C]//ICASSP 2022-2022 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2022:4448-4452.
[37]LI C,HUANG Q,MAO Y.Dd-gcn:Directed diffusion graphconvolutional network for skeleton-based human action recognition[C]//2023 IEEE International Conference on Multimedia and Expo(ICME).IEEE,2023:786-791
[38]CHENG K,ZHANG Y,HE X,et al.Skeleton-based action re-cognition with shift graph convolutional network[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:183-192.
[39]LIU X,YUAN G,BING R,et al.When Skeleton Meets Motion:Adaptive Multimodal Graph Representation Fusion for Action Recognition[C]//2024 IEEE International Conference on Multimedia and Expo(ICME).IEEE,2024:1-6.
[40]WU H,MA X,LI Y.Spatiotemporal multimodal learning with 3D CNNs for video action recognition[J].IEEE Transactions on Circuits and Systems for Video Technology,2021,32(3):1250-1261.
[41]ZHAO C,CHEN M,ZHAO J,et al.3d behavior recognition based on multi-modal deep space-time learning[J].Applied Sciences,2019,9(4):716.
[42]CHAO X,JI G,QI X.Multi-view key information representation and multi-modal fusion for single-subject routine action recognition[J].Applied Intelligence,2024,54(4):3222-3244.
[1] CHANG Xuanwei, DUAN Liguo, CHEN Jiahao, CUI Juanjuan, LI Aiping. Method for Span-level Sentiment Triplet Extraction by Deeply Integrating Syntactic and Semantic
Features
[J]. Computer Science, 2026, 53(2): 322-330.
[2] ZHAI Jie, LI Yanhao, CHEN Lexuan, GUO Weibin. Dynamic Recommendation of Personalized Hands-on Learning Materials Based on LightweightEducational LLMs [J]. Computer Science, 2026, 53(2): 48-56.
[3] HU Hailong, XU Xiangwei, LI Yaqian. Drug Combination Recommendation Model Based on Dynamic Disease Modeling [J]. Computer Science, 2025, 52(9): 96-105.
[4] WANG Jia, XIA Ying, FENG Jiangfan. Few-shot Video Action Recognition Based on Two-stage Spatio-Temporal Alignment [J]. Computer Science, 2025, 52(8): 251-258.
[5] LI Mengxi, GAO Xindan, LI Xue. Two-way Feature Augmentation Graph Convolution Networks Algorithm [J]. Computer Science, 2025, 52(7): 127-134.
[6] BIAN Hui, MENG Changqian, LI Zihan, CHEN Zihaoand XIE Xuelei. Continuous Sign Language Recognition Based on Graph Convolutional Network and CTC/Attention [J]. Computer Science, 2025, 52(6A): 240400098-9.
[7] TAN Qiyin, YU Jiong, CHEN Zixin. Outlier Detection Method Based on Adaptive Graph Autoencoder [J]. Computer Science, 2025, 52(6): 129-138.
[8] ZHANG Jiaxiang, PAN Min, ZHANG Rui. Study on EEG Emotion Recognition Method Based on Self-supervised Graph Network [J]. Computer Science, 2025, 52(5): 122-127.
[9] HUANG Qian, SU Xinkai, LI Chang, WU Yirui. Hypergraph Convolutional Network with Multi-perspective Topology Refinement forSkeleton-based Action Recognition [J]. Computer Science, 2025, 52(5): 220-226.
[10] ZHAO Hongyi, LI Zhiyuan, BU Fanliang. Multi-language Embedding Graph Convolutional Network for Hate Speech Detection [J]. Computer Science, 2025, 52(11A): 241200023-8.
[11] ZHAO Zhuoyang, QIN Donghong, BAI Fengbo, LIANG Xianye, XU Chen, ZHENG Yuehua, LIANG Yufeng, LAN Sheng, ZHOU Guoping. ZHA_TGCN:A Topic Classification Method for Low-resource Sawcuengh Language [J]. Computer Science, 2025, 52(11A): 250100059-8.
[12] HU Jintao, XIAN Guangming. Self-attention-based Graph Contrastive Learning for Recommendation [J]. Computer Science, 2025, 52(11): 82-89.
[13] ZHAO Chen, PENG Jian, HUANG Junhao. Spatial-Temporal Joint Mapping for Skeleton-based Action Recognition [J]. Computer Science, 2025, 52(10): 106-114.
[14] ZHANG Lu, DUAN Youxiang, LIU Juan, LU Yuxi. Chinese Geological Entity Relation Extraction Based on RoBERTa and Weighted Graph Convolutional Networks [J]. Computer Science, 2024, 51(8): 297-303.
[15] YUAN Lining, FENG Wengang, LIU Zhao. Multi-channel Graph Convolutional Networks Enhanced by Label Propagation Algorithm [J]. Computer Science, 2024, 51(8): 304-312.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!