计算机科学 ›› 2026, Vol. 53 ›› Issue (3): 246-256.doi: 10.11896/jsjkx.241100165

• 计算机图形学 & 多媒体 • 上一篇    下一篇

基于改进YOLO算法的学生行为检测方法

王鑫钰1, 高东怀2, 宁玉文2, 许浩2, 齐浩楠1   

  1. 1 西北大学网络和数据中心 西安 710127
    2 空军军医大学教研保障中心 西安 710032
  • 收稿日期:2024-11-27 修回日期:2025-08-20 发布日期:2026-03-12
  • 通讯作者: 宁玉文(ningyuwen@163.com)
  • 作者简介:(2804503826@qq.com)
  • 基金资助:
    陕西省科技厅2022年重点研发计划课题(2022SF-068);空军军医大学教学保障研究课题(2022JB-03);2023年度陕西本科和高等继续教育教学改革研究项目(23BY206)

Student Behavior Detection Method Based on Improved YOLO Algorithm

WANG Xinyu1, GAO Donghuai2, NING Yuwen2, XU Hao2, QI Haonan1   

  1. 1 Network and Data Center, Northwest University, Xi’an 710127, China
    2 Teaching and Research Support Center, Air Force Medical University, Xi’an 710032, China
  • Received:2024-11-27 Revised:2025-08-20 Online:2026-03-12
  • About author:WANG Xinyu,born in 1999,postgra-duate.His main research interests include smart teaching and goal detection.
    NING Yuwen,born in 1984,associate professor,master’s supervisor.His mainresearch interests include intelligent development technology and application of teaching resources.
  • Supported by:
    Shaanxi Provincial Department of Science and Technology 2022 Key Research and Development Program(2022SF-068),Air Force Medical University Teaching Support Research Program(2022JB-03) and 2023 Shaanxi Undergraduate and Higher Continuing Education Teaching Reform Research Project(23BY206).

摘要: 为了解决课堂情景下学生行为检测因尺度变化大、遮挡严重、计算负担大而难以大范围普及等问题,提出了一种基于改进YOLOv8的轻量化学生课堂行为检测方法BDEO-YOLO。首先,在YOLOv8n的基础上引入动态卷积(Dynamic Convolution)对YOLOv8中的C2f模块进行改进,增强了模型对课堂复杂场景的适应性和特征表达能力。其次,通过结合双向特征金字塔网络(Bidirectional Feature Pyramid Network,BiFPN)和全局局部空间聚合(Global-to-Local Spatial Aggregation,GLSA)模块,优化了模型的多尺度特征融合能力,在模型的Backbone部分引入了高效局部注意力(Efficient Local Attention,ELA)机制,增强了模型对小目标和细节特征的检测能力。最后,设计了轻量化的检测头one13结构,简化了特征提取过程,大幅降低了模型的计算负担。在公开数据集STBD-08上的实验结果表明,BDEO-YOLO模型的mAP达到92.2%,比原始YOLOv8n提高了1.3个百分点,计算量从8.1 GFLOPs降低至4.8 GFLOPs,比原模型降低了40.7%,模型大小仅有5.7MB,验证了轻量化设计的有效性。在公开数据集SCB-Dataset3和VOC2007上进行验证,改进后的算法在各项性能指标上均有所提升,验证了模型的泛化能力,其在处理课堂中的遮挡、尺度变化和光照变化等问题上表现出较高的鲁棒性。

关键词: 学生行为检测, 轻量化, 动态卷积, BiFPN, 注意力机制

Abstract: In order to solve the problems of large scale variations,serious occlusions,and large computational burden that makes it difficult to popularize on a wide scale for student behavior detection in classroom scenarios,this paper proposes a lightweight student classroom behavior detection method BDEO-YOLO based on the improved YOLOv8.Firstly,dynamic convolution is introduced on the basis of YOLOv8n C2f module in YOLOv8,which enhances the model’s adaptability to complex classroom scenarios and feature expression ability.Secondly,the multi-scale feature fusion ability of the model is optimized by combining Bi-FPN and GLSA,and ELA mechanism is introduced in the Backbone part of the model,which enhances the model’s ability to detect small targets and detailed features.Finally,a lightweight detection head one13 structure is designed to simplify the feature extraction process and significantly reduce the computational burden of the model.Experimental results on the public dataset STBD-08 show that the mAP of the BDEO-YOLO model reaches 92.2%,which is 1.3 percentage points higher than that of the ori-ginal YOLOv8n,and the computational burden is reduced from 8.1 GFLOPs to 4.8 GFLOPs,which is 40.7% lower than the ori-ginal model,and the model size is only 5.7 MB,which verifies the effectiveness of the lightweight design.Validation on the public datasets SCB-Dataset3 and VOC2007 shows that the improved algorithm improves in all performance metrics,verifies the genera-lization ability of the model,and exhibits high robustness in dealing with occlusion,scale change,and illumination change in the classroom.

Key words: Student behavior detection, Lightweight, Dynamic convolution, BiFPN, Attention mechanism

中图分类号: 

  • TP391
[1]LIU Q T,HE H Y,WU L J,et al.Classroom Teaching Behavior Analysis Method Basde on Artificial Intelligence and Its Application[J].China Educational Technology,2019(9):9.
[2]GUO J Q,LYU J H,WANG R H,et al.Classroom behaviorrecognition driven by deep learning model[J].Journal of Beijing Normal University(Natural Science),2021,57(6):905-912.
[3]HUANG K Y,LIANG M Y,WANG X X,et al.Multi-person classroom action recognition in classroom teaching videos based on deep spatiotemporal residual convolution neural network[J].Journal of Computer Applications,2022,42(3):736-742.
[4]YAN X Y,KUANG Y X,BAI G R,et al.Student Classroom Behavior Recognition Method Based on Deep Learning[J].Computer Engineering,2023,49(7):251-258.
[5]CHEN H,ZHOU G,JIANG H.Student Behavior Detection in the Classroom Based on Improved YOLOv8[J].Sensors,2023,23(20):8385.
[6]TAN S Q,TANG G F,TU Y Y,et al.Classroom Monitoring Students Abnormal Behavior Detection System[J].Computer Engineering and Applications,2022,58(7):176-184.
[7]DAI J,LI Y,HE K,et al.R-FCN:Object Detection via Region-based Fully Convolutional Networks[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.2016:379-387.
[8]SHAOQING R,KAIMING H,ROSS G,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[9]HE K,GKIOXARI G,DOLLAR P,et al.Mask R-CNN[C]//IEEE Transactions on Pattern Analysis & Machine Intelligence.IEEE,2017.
[10]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single Shot MultiBox Detector.[J].arXiv:1512.02325,2015.
[11]TSUNG-YI L,PRIYA G,ROSS G,et al.Focal Loss for Dense Object Detection.[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,42(2):318-327.
[12]REDMON J,DIVVALA K S,GIRSHICK B R,et al.You Only Look Once:Unified,Real-Time Object Detection.[J].arXiv:1506.02640,2015.
[13]HAN K,WANG Y,GUO J,et al.ParameterNet:ParametersAre All You Need for Large-scale Visual Pretraining of Mobile Networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:15751-15761.
[14]TAN M X,PAN R M,LE Q L.EfficientDet:Scalable and efficient object detection[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.2020:10778-10787.
[15]TANG F,XU Z,HUANG Q,et al.DuAT:Dual-aggregationtransformer network for medical image segmentation[C]//Chinese Conference on Pattern Recognition and Computer Vision(PRCV).Singapore:Springer,2023:343-356.
[16]LIU S,QI L,QIN H,et al.Path Aggregation Network for Instance Segmentation.[J].arXiv:1803.01534,2018.
[17]XU W,WAN Y.ELA:Efficient Local Attention for Deep Con-volutional Neural Networks[J].arXiv:2403.01123,2024.
[18]ZHAO J D,ZHEN G Y,CHU C Q.Unmanned Aerial Vehicle Image Target Detection Algorithm Based on YOLOv8[J].Computer Engineering,2024,50(4):113-120.
[19]GE Z.YOLOX:Exceeding YOLO Series in 2021[J].arXiv:2107.08430,2021.
[20]WANG C Y,LIAO H Y M,WU Y H,et al.CSPNet:A new backbone that can enhance learning capability of CNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2020:390-391.
[21]WANG C Y,BOCHKOVSKIY A,LIAO H Y M.YOLOv7:Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2023:7464-7475.
[22]LIN T Y,DOLLAR P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).IEEE Computer Society,2017.
[23]ZHAO J,ZHU H.CBPH-Net:A small object detector for behavior recognition in classroom scenarios[J].IEEE Transactions on Instrumentation and Measurement,2023,72:2521112.
[24]CHEN J,KAO S,HE H,et al.Run,don’t walk:chasing higher FLOPS for faster neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:12021-12031.
[25]OUYANG D,HE S,ZHANG G,et al.Efficient multi-scale attention module with cross-spatial learning[C]//ICASSP 2023-2023 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2023:1-5.
[26]MA X,DAI X,BAI Y,et al.Rewrite the Stars[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:5694-5703.
[27]CHEN Y,YUAN X,WU R,et al.Yolo-ms:rethinking multi-scale representation learning for real-time object detection[J].arXiv:2308.05480,2023.
[28]PENG Y,SONKA M,CHEN D Z.U-Net v2:Rethinking the skip connections of U-Net for medical image segmentation[J].arXiv:2311.17791,2023.
[29]LAU K W,PO L M,REHMAN Y A U.Large separable kernel attention:Rethinking the large kernel attention design in cnn[J].Expert Systems with Applications,2024,236:121352.
[30]CAI X,LAI Q,WANG Y,et al.Poly kernel inception network for remote sensing detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:27706-27716.
[31]LI C,LI L,JIANG H,et al.YOLOv6:A single-stage object detection framework for industrial applications[J].arXiv:2209.02976,2022.
[32]WANG C Y,BOCHKOVSKIY A,LIAO H Y M.YOLOv7:Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:7464-7475.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!