计算机科学 ›› 2026, Vol. 53 ›› Issue (2): 57-66.doi: 10.11896/jsjkx.250500100

• 基于图机器学习的教育数据挖掘 • 上一篇    下一篇

CPViG-Net:基于局部跨阶段视觉图卷积的学生课堂行为识别

张浩鹏1, 施铮2, 刘峰1,2, 宋婉茹2   

  1. 1 南京邮电大学通信与信息工程学院 南京 210023
    2 南京邮电大学教育科学与技术学院 南京 210023
  • 收稿日期:2025-05-26 修回日期:2025-09-15 发布日期:2026-02-10
  • 通讯作者: 宋婉茹(songwanru@njupt.edu.com)
  • 作者简介:(1224014629@njupt.edu.cn)
  • 基金资助:
    国家自然科学基金(62307025,62177029);南京邮电大学2023教改项目(JG01723JX71)

CPViG-Net:Students’ Classroom Behavior Recognition Based on Cross-stage Visual GraphConvolution

ZHANG Haopeng1, SHI Zheng2, LIU Feng1,2, SONG Wanru2   

  1. 1 School of Communication and Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
    2 School of Educational Science and Technology,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
  • Received:2025-05-26 Revised:2025-09-15 Online:2026-02-10
  • About author:ZHANG Haopeng,born in 2001,postgraduate.His main research interests include image processing and AI foreducation.
    SONG Wanru,born in 1992,Ph.D.Her main research interests include image processing,pattern recognition and AI for education.
  • Supported by:
    National Natural Science Foundation of China(62307025,62177029) and Project of EducationTeaching Reform of Nanjing University of Posts and Telecommunications(JG01723JX71).

摘要: 随着教育范式从“人机协同”向“人智协同共育”演进,课堂教学的智能化评价也面临着新的要求和挑战,其中以学生行为为出发点的任务近些年来获得了广泛的关注。针对真实课堂环境中存在的学生行为多样、遮挡频繁及背景干扰严重等问题,提出一种局部跨阶段视觉图卷积模型,旨在提升复杂课堂环境下的学生行为识别精度。该模型以经典目标检测算法为基准框架,通过融合视觉图卷积神经网络的动态特征建模能力,构建了局部最大相对图卷积模块(PMG)与局部跨阶段融合(CPF)模块。其中,PMG模块通过嵌入最大相对图卷积来捕捉节点间特征差异最大的邻域信息,进而针对性地解决局部区域遮挡引起的信息丢失问题,并结合了深度可分离卷积降低图卷积算法的计算开销;CPF模块利用全连接层重构特征结构,并通过C2f模块的跨阶段连接机制,实现多层级的特征融合,从而增强模型对小尺度目标的识别能力。此外,模型通过近邻K值优化,提出针对不同数据集的优化策略。在公开数据集 SCB03-S上,CPViG-Net的mAP@50达到 70.9%,较基准模型提升2个百分点;在多个公开数据集上的实验表明,该模型在处理真实课堂情境下学生行为识别面临的诸多问题中表现出较好的性能和较高的鲁棒性。

关键词: 学生行为, 最大相对图卷积, 多尺度目标识别, 遮挡, 深度可分离卷积

Abstract: With the evolution of educational paradigms from “human-computer collaboration” to “human-intelligence collaborative co-education”,the intelligent evaluation of teaching is also facing new requirements and challenges.In recent years,the task that takes student behavior as the starting point has gained widespread attention.Aiming at the challenges of diverse student beha-viors,heavy occlusions and severe background interference in real classroom environments,a cross-stage partial vision graph network(CPViG-Net) is proposed to enhance the accuracy of student behavior detection in complex classroom settings.Based on a classic object detection framework,the model integrates the dynamic feature modeling ability of the vision GNN and constructs the partial max-relative graph convolution(PMG) module and the cross-stage partial fusion(CPF) module.The PMG module captures the neighborhood information with the greatest feature differences between nodes by embedding maximum relative graph convolution,thereby specifically addressing the issue of information loss caused by local occlusions.It also incorporates depthwise separable convolution to reduce the computational cost of the graph convolution algorithm.The CPF module reconstructs the feature structure using fully connected layers and leverages the cross-stage connection mechanism of the C2f module to achieve multi-level feature fusion,thereby enhancing the ability of the model to recognize small-scale objects.In addition,the model proposes optimization strategies for different datasets through the optimization of nearest neighbor K values.On the public dataset SCB03-S,the mAP@50 of CPViG-Net reaches 70.9%,which is a 2 percentage points improvement over the baseline model.Experiments on multiple publicly available datasets demonstrate that the model exhibits good performance and high robustness in addressing the various challenges of student behavior recognition in real classroom scenarios.

Key words: Student behavior, Max-relative graph convolution, Multi-scale object recognition, Occlusion, Depthwise separable convolution

中图分类号: 

  • TP391
[1]ZHU Z T,HAN Z M,HUANG C Q.Educational Artificial Intelligence(eAI):A new paradigm of human-centered artificial intelligence[J].e-Education Research,2021,42(1):5-15.
[2]SINGH H,MIAH S J.Smart education literature:A theoretical analysis[J].Education and Information Technologies,2020,25(4):3299-3328.
[3]HUANG T,ZHANG Z M,LIU S Y.Coexistence for Symbiosis:How Human-Intelligence Collaborative Co-education is Possible[J].Educational Research,2025,46(1):147-159.
[4]ROMERO C,VENTURA S.Educational data mining:a review of the state of the art[J].IEEE Transactions on Systems,Man,and Cybernetics,Part C(Applications and Reviews),2010,40(6):601-618.
[5]YU M,XU J,ZHONG J,et al.Behavior detection and analysis for learning process in classroom environment[C]//2017 IEEE Frontiers in Education Conference(FIE).IEEE,2017:1-4.
[6]ZHANG X F.The Transformation of Traditional EducationalEvaluation:Educational Evaluation Based on the Theory of Multiple Intelligences[J].Educational Science Research,2002(4):28-30.
[7]ZHAO J,ZHU H.Cbph-net:A small object detector for beha-vior recognition in classroom scenarios[J].IEEE Transactions on Instrumentation and Measurement,2023,72:1-12.
[8]TAN S Y,WANG Z X,HE G D.Real-time Panoramic Multi-scale Classroom Behaviors Recognition Based on CA-YOLOv9 Network[J].Modern Educational Technology,2024,34(7):123-130.
[9]LIAO S B,QI F.Research on machine analysis of classroomteacher-student interaction behavior[J].Journal of Central China Normal University(Natural Sciences),2024,58(2):279-285.
[10]WANG D Q,LIU H,QIU M L.Analysis Method and Application Verification on Teacher Behavior Data in Smart Classroom[J].China Educational Technology,2020(5):120-127.
[11]YADAV D K,KUMARI N,HARRON S.Advances in Convolutional Neural Networks for Object Detection and Recognition[C]//2024 International Conference on Optimization Computing and Wireless Communication(ICOCWC).IEEE,2024:1-6.
[12]XIAO H,LIU X D.Real-time acquisition and dynamic analysis of learning state based on hybrid intelligence[J].Journal of Jilin University(Engineering and Technology Edition),2025,55(7):2402-2408.
[13]LI X W,YE J W,ZHANG Q H.On the Index Model ofTeacher-Student Interaction Behaviors Under the Background of “Internet+Teaching”[J].Research in Higher Education of Engineering,2020(3):157-162
[14]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towardsreal-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(6):1137-1149.
[15]ZENG C,YAN K,WANG Z,et al.Abs-CAM:a gradient optimization interpretable approach for explanation of convolutional neural networks[J].Signal,Image and Video Processing,2023,17(4):1069-1076.
[16]WANG Z,WANG Z,ZENG C,et al.High-quality image compressed sensing and reconstruction with multi-scale dilated convolutional neural network[J].Circuits,Systems,and Signal Processing,2023,42(3):1593-1616.
[17]REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:Unified,real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:779-788.
[18]BOCHKOVSKIY A,WANG C Y,LIAO H Y M.Yolov4:Optimal speed and accuracy of object detection[J].arXiv:2004.10934,2020.
[19]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]//Computer Vision-ECCV 2016:14th European Conference.Springer,2016:21-37.
[20]CHU Q,OUYANG W,LI H,et al.Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:4836-4845.
[21]CHENG J,TANG Y,HE C,et al.Rethinking Variational Bayes in Community Detection From Graph Signal Perspective[J].IEEE Transactions on Knowledge and Data Engineering,2025,37(5):2903-2917.
[22]UTTARKABAT S,NAYAK S,CHAUDHURI S P,et al.e-Framework for m-Health Detection and Control Using GNN[C]//IECON 2023-49th Annual Conference of the IEEE Industrial Electronics Society.IEEE,2023:1-6.
[23]HUANG X,HUANG C.NGD:Filtering graphs for visual analysis[J].IEEE Transactions on Big Data,2016,4(3):381-395.
[24]CAO Z,SIMON T,WEI S E,et al.Realtime multi-person 2d pose estimation using part affinity fields[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7291-7299.
[25]LI J N,LI R Y,ZHAO Z F,et al.Recognition and Analysis of Teaching Behavior Based on Multi-scale GCN[J].Computer Science,2024,51(10):135-143.
[26]CHI H,HA M H,CHI S,et al.Infogcn:Representation learning for human skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:20186-20196.
[27]YANG W,ZHANG J,CAI J,et al.HybridNet:Integrating GCN and CNN for skeleton-based action recognition[J].Applied Intelligence,2023,53(1):574-585.
[28]HAN K,WANG Y,GUO J,et al.Vision gnn:An image is worth graph of nodes[J].Advances in Neural Information Processing Systems,2022,35:8291-8303.
[29]SOUDEEP S,MRIDHA M F,JAHIN M A,et al.DGNN-YOLO:Dynamic Graph Neural Networks with YOLO11 for Small Object Detection and Tracking in Traffic Surveillance[J].arXiv:2411.17251,2024.
[30]WANG C Y,LIAO H Y M,WU Y H,et al.CSPNet:A new backbone that can enhance learning capability of CNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2020:390-391.
[31]YAN Q G,ZHAO J,CHA X G,et al.The Evaluation of Tea-ching Effect based on Interpretable Machine Learning[C]//2021 8th International Conference on Dependable Systems and Their Applications(DSA).IEEE,2021:712-715.
[32]JIANG H,WANG H.Designing and Implementing an Intelligent Machine Learning-Based Evaluation System for Assessing English Teaching Quality in Vocational Education[C]//2024 International Conference on Interactive Intelligent Systems and Techniques(IIST).IEEE,2024:36-40.
[33]WANG X Y,GAO D H,NING Y W,et al.Research on Lightweight Student Behavior Detection Method Based on Improved YOLO Algorithm[J/OL].Computer Science,1-15.
[34]MUNIR M,AVERY W,MARCULESCU R.Mobilevig:Graph-based sparse attention for mobile vision applications[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:2211-2219.
[35]YANG F,WANG T,WANG X.Student classroom behavior detection basedon YOLOv7+ BRA and multi-model fusion[C]//International Conference on Image and Graphics.Cham:Sprin-ger,2023:41-52.
[36]CHEN J,KAO S,HE H,et al.Run,don’t walk:chasing higher FLOPS for faster neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:12021-12031.
[37]MA X,DAI X,BAI Y,et al.Rewrite the stars[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:5694-5703.
[38]ZHU P,WEN L,DU D,et al.Detection and tracking meetdrones challenge[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,44(11):7380-7399.
[39]GE Z,LIU S,WANG F,et al.Yolox:Exceeding yolo series in 2021[J].arXiv:2107.08430,2021.
[40]ZHAO Y,LYU W,XU S,et al.Detrs beat yolos on real-time object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:16965-16974.
[41]PENG Y,LI H,WU P,et al.D-FINE:redefine regression Task in DETRs as Fine-grained distribution refinement[J].arXiv:2410.13842,2024.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!