计算机科学 ›› 2026, Vol. 53 ›› Issue (2): 89-98.doi: 10.11896/jsjkx.250800007

• 基于图机器学习的教育数据挖掘 • 上一篇    下一篇

基于多模态体育教育数据的图空间融合动作识别方法

陈海涛1, 梁俊威2, 陈晨3, 王宇帆4, 周宇1   

  1. 1 深圳大学计算机与软件学院 广东 深圳 518060
    2 深圳信息职业技术学院 广东 深圳 518172
    3 西北大学文学院 西安 710127
    4 上海交通大学机械工程学院 上海 200240
  • 收稿日期:2025-08-04 修回日期:2025-11-03 发布日期:2026-02-10
  • 通讯作者: 周宇(zhouyu_1022@126.com)
  • 作者简介:(2410105035@mails.szu.edu.cn)
  • 基金资助:
    国家自然科学基金面上项目(72271168);广东省自然科学基金面上项目(2024A1515012485);广东省重点领域研发计划(2024B0101120003);深圳市科技重大专项(KJZD20230923114111021);深圳市基础研究面上项目(JCYJ20220810112354002);广东省基础与应用基础研究区域联合基金-青年基金项目(2023A1515110070)

Multimodal Physical Education Data Fusion via Graph Alignment for Action Recognition

CHEN Haitao1, LIANG Junwei2, CHEN Chen3, WANG Yufan4, ZHOU Yu1   

  1. 1 College of Computer Science and Software Engineering,Shenzhen University,Shenzhen,Guangdong 518060,China
    2 Shenzhen Institute of Information Technology,Shenzhen,Guangdong 518172,China
    3 Faculty of Liberal Arts,Northwest University,Xi’an 710127,China
    4 School of Mechanical Engineering,Shanghai Jiao Tong University,Shanghai 200240,China
  • Received:2025-08-04 Revised:2025-11-03 Online:2026-02-10
  • About author:CHEN Haitao,born in 2001,postgra-duate.His main research interests include multimodal data fusion methods in the field of action recognition,and so on.
    ZHOU Yu,born in 1987,Ph.D,associate professor,is a member of CCF(No.P7618M).His main research interests include computational intelligence,machine learning and intelligent information processing.
  • Supported by:
    Surface Project of the National Natural Science Foundation of China(72271168),Surface Project of the Natural Science Foundation of Guangdong Province,China(2024A1515012485),Key Field Research and Development Program of Guangdong Province,China(2024B0101120003),Major Science and Technology Special Project of Shenzhen,China(KJZD20230923114111021),Surface Project of the Basic Research of Shenzhen,China(JCYJ20220810112354002) and Joint Funds of the Basic and Applied Basic Research Area of Guangdong Province,China-the Program of the Young Scientists Fund(2023A1515110070).

摘要: 在智能体育与教育信息化的背景下,精细化的人体动作识别已成为体育教学与训练评估中的关键技术。针对传统动作识别方法在复杂运动场景中存在的模态信息利用不足、时空结构表达受限等问题,提出了一种融合骨架数据与可穿戴传感器信息的多模态图卷积网络模型。首先,提出了一种基于“虚拟传感器”的融合方法,将可穿戴传感器信号映射至骨架关节构建的时空图结构中并融合,有效提升了对动作细节的建模能力与跨模态语义一致性。其次,构建了针对复杂运动模式的多层图卷积网络,通过对身体进行局部划分,增强了模型在复杂体育场景下的识别能力。此外,面向击剑这一技术动作复杂的竞技项目,自主采集并构建了一套涵盖不同典型技术动作与运动水平层次的多模态数据集,为精细化动作识别与水平评估提供了数据支持。在该数据集与多个标准数据集上进行的实验表明,所提方法在动作识别精度与技术水平判断上优于现有主流方法,为体育教育场景中的智能识别与评估提供了新的建模框架与技术支持,具有良好的应用前景。

关键词: 动作识别, 多模态数据融合, 图卷积网络, 体育教育, 击剑数据集

Abstract: In the context of intelligent sports and educational informatization,fine-grained human action recognition has become a key technology in physical education and training assessment.To address the limitations of traditional methods in utilizing multi-modal information and capturing spatio-temporal structures in complex motion scenarios,this paper proposes a multi-modal graph convolutional network model that fuses skeleton data and wearable sensor information.Firstly,it proposes a fusion method based on “virtual sensors,” which maps wearable sensor signals onto a spatio-temporal graph constructed from skeletal joints,enabling effective integration of multimodal information and enhancing fine-grained motion modeling and cross-modal semantic consistency.Secondly,it designs a multi-layer graph convolutional network tailored for complex sports movements,incorporating local body part segmentation to improve recognition performance in challenging scenarios.Thirdly,it constructs a high-quality multimodal dataset for fencing,covering various technical actions and skill levels,to support fine-grained action recognition and skill assessment.Experimental results on both this dataset and several public benchmarks demonstrate that the proposed method outperforms existing approaches in both action recognition accuracy and skill level classification.This work provides a novel mode-ling framework and technical support for intelligent recognition and evaluation in sports education.

Key words: Action recognition, Multimodal data fusion, Graph convolutional network, Physical education, Fencing dataset

中图分类号: 

  • TP391
[1]SONG Y F,ZHANG Z,SHAN C,et al.Constructing stronger and faster baselines for skeleton-based action recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(2):1474-1488.
[2]RAO D S,RAO L K,BHAGYARAJU V,et al.Enhanced Depth Motion Maps for Improved Human Action Recognition from Depth Action Sequences[J].Traitement du Signal,2024,41(3):1461-1472.
[3]LAI Y T,LIN C H,CHOU P Y.Real-Time Point Cloud Action Recognition System with Automated Point Cloud Preprocessing[C]//2024 IEEE International Conference on Consumer Electronics(ICCE).IEEE,2024:1-7.
[4]YANG Y,YANG H,LIU Z,et al.Fall detection system based on infrared array sensor and multi-dimensional feature fusion[J].Measurement,2022,192:110870.
[5]ZHU K,WONG A,MCPHEE J.Fencenet:Fine-grainedfoot-work recognition in fencing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:3589-3598.
[6]TAO W,CHEN H,MONIRUZZAMAN M,et al.Attention-Based Sensor Fusion for Human Activity Recognition Using IMU Signals[J].arXiv:2112.11224,2021.
[7]AHMAD Z,KHAN N.Towards improved human action recognition using convolutional neural networks and multimodal fusion of depth and inertial sensor data[C]//2018 IEEE International Symposium on Multimedia(ISM).IEEE,2018:223-230.
[8]AKTAS M E,AKBAS E,FATMAOUI A E.Persistence homo-logy of networks:methods and applications[J].Applied Network Science,2019,4(1):1-28.
[9]LE V T,TRAN-TRUNG K,HOANG V T.A comprehensive review of recent deep learning techniques for human activity re-cognition[J].Computational Intelligence and Neuroscience,2022,2022(1):8323962.
[10]YAN S,XIONG Y,LIN D.Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2018.
[11]LI M,CHEN S,CHEN X,et al.Actional-structural graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3595-3603.
[12]SHI L,ZHANG Y,CHENG J,et al.Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:12026-12035.
[13]HU L,LIU S,FENG W.Spatial Temporal Graph AttentionNetwork for Skeleton-Based Action Recognition[J].arXiv:2208.08599,2022.
[14]CHEN Z,LI S,YANG B,et al.Multi-Scale Spatial TemporalGraph Convolutional Network for Skeleton-Based Action Re-cognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:1113-1122.
[15]SUN Z,KE Q,RAHMANI H,et al.Human action recognition from various data modalities:A review[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(3):3200-3225.
[16]BOULAHIA S Y,AMAMRA A,MADI M R,et al.Early,intermediate and late fusion strategies for robust deep learning-based multimodal action recognition[J].Machine Vision and Applications,2021,32(6):121.
[17]CHEN T,MO L.Swin-fusion:swin-transformer with feature fusion for human action recognition[J].Neural Processing Letters,2023,55(8):11109-11130.
[18]QIU S,FAN T,JIANG J,et al.A novel two-level interactive action recognition model based on inertial data fusion[J].Information Sciences,2023,633:264-279.
[19]CHOI H,BEEDU A,ESSA I.Multimodal Contrastive Learning with Hard Negative Sampling for Human Activity Recognition[J].arXiv:2309.01262,2023.
[20]HU Z,XIAO J,LI L,et al.Human-centric multimodal fusionnetwork for robust action recognition[J].Expert Systems with Applications,2024,239:122314.
[21]YUAN Z,YANG Z,NING H,et al.Multiscale knowledge distillation with attention based fusion for robust human activity re-cognition[J].Scientific Reports,2024,14(1):12411.
[22]CHEN Z,SONG X,ZHANG Y,et al.Intelligent Recognition of Physical Education Teachers’ Behaviors Using Kinect Sensors and Machine Learning[J].Sensors & Materials,2022,34(3):1241-1253.
[23]HAN J Z,ZHAO J J,YUE Y,et al.Edge Computing-based Vi-deo Action Recognition Method and Its Application in Online Physical Education Teaching[J].IEEE Access,2024,12:148666-148676.
[24]DING X,PENG W,YI X.Evaluation of physical educationteaching effect based on action skill recognition[J].Computational Intelligence and Neuroscience,2022,2022(1):9489704.
[25]FU D,CHEN L,CHENG Z.Integration of wearable smart devices and internet of things technology into public physical education[J].Mobile Information Systems,2021,2021(1):6740987.
[26]SRI-IESARANUSORN P,GARCIA F C,TIAUSAS F,et al.Toward the perfect stroke:A multimodal approach for table tennis stroke evaluation[C]//2021 Thirteenth International Conference on Mobile Computing and Ubiquitous Network(ICMU).IEEE,2021:1-5.
[27]YUAN H,NI D,WANG M.Spatio-temporal dynamic inference network for group activity recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:7476-7485.
[28]HU L,LIU S,FENG W.Spatial temporal graph attention net-work for skeleton-based action recognition[J].arXiv:2208.08599,2022.
[29]DUHME M,MEMMESHEIMER R,PAULUS D.Fusion-gcn:Multimodal action recognition using graph convolutional networks[C]//DAGM German Conference on Pattern Recognition.Cham:Springer,2021:265-281.
[30]IJAZ M,DIAZ R,CHEN C.Multimodal transformer for nursing activity recognition[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2022:2065-2074
[31]KONG Q,WU Z,DENG Z,et al.Mmact:A large-scale dataset for cross modal human action understanding[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:8658-8667.
[32]CHAO X,HOU Z,MO Y.CZU-MHAD:a multimodal dataset for human action recognition utilizing a depth camera and 10 wearable inertial sensors[J].IEEE Sensors Journal,2022,22(7):7034-7042.
[33]CHEN C,JAFARI R,KEHTARNAVAZ N.UTD-MHAD:A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor[C]//2015 IEEE International Conference on Image Processing(ICIP).IEEE,2015:168-172.
[34]CHOI H,BEEDU A,HARESAMUDRAM H,et al.Multi-stage based feature fusion of multi-modal data for human activity re-cognition[J].arXiv:2211.04331,2022.
[35]GAO Z,WANG Y,CHEN J,et al.Mmtsa:Multi-modal temporal segment attention network for efficient human activity recognition[J].Proceedings of the ACM on Interactive,Mobile,Wearable and Ubiquitous Technologies,2023,7(3):1-26.
[36]NI J,SARBAJNA R,LIU Y,et al.Cross-modal knowledge distillation for vision-to-sensor action recognition[C]//ICASSP 2022-2022 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2022:4448-4452.
[37]LI C,HUANG Q,MAO Y.Dd-gcn:Directed diffusion graphconvolutional network for skeleton-based human action recognition[C]//2023 IEEE International Conference on Multimedia and Expo(ICME).IEEE,2023:786-791
[38]CHENG K,ZHANG Y,HE X,et al.Skeleton-based action re-cognition with shift graph convolutional network[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:183-192.
[39]LIU X,YUAN G,BING R,et al.When Skeleton Meets Motion:Adaptive Multimodal Graph Representation Fusion for Action Recognition[C]//2024 IEEE International Conference on Multimedia and Expo(ICME).IEEE,2024:1-6.
[40]WU H,MA X,LI Y.Spatiotemporal multimodal learning with 3D CNNs for video action recognition[J].IEEE Transactions on Circuits and Systems for Video Technology,2021,32(3):1250-1261.
[41]ZHAO C,CHEN M,ZHAO J,et al.3d behavior recognition based on multi-modal deep space-time learning[J].Applied Sciences,2019,9(4):716.
[42]CHAO X,JI G,QI X.Multi-view key information representation and multi-modal fusion for single-subject routine action recognition[J].Applied Intelligence,2024,54(4):3222-3244.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!