计算机科学 ›› 2024, Vol. 51 ›› Issue (10): 56-66.doi: 10.11896/jsjkx.240400109

• 智能教育技术及应用 • 上一篇    下一篇

基于视频理解的教学过程感知与分析

段欣然, 王玫, 韩天利, 周洪宇, 郭俊奇, 计卫星, 黄华   

  1. 北京师范大学人工智能学院 北京 100875
  • 收稿日期:2024-04-15 修回日期:2024-06-27 出版日期:2024-10-15 发布日期:2024-10-11
  • 通讯作者: 黄华(huahuang@bnu.edu.cn)
  • 作者简介:(202011081033@mail.bnu.edu.cn)
  • 基金资助:
    国家自然科学基金(62306043)

Perception and Analysis of Teaching Process Based on Video Understanding

DUAN Xinran, WANG Mei, HAN Tianli, ZHOU Hongyu, GUO Junqi, JI Weixing, HUANG Hua   

  1. School of Artificial Intelligence,Beijing Normal University,Beijing 100875,China
  • Received:2024-04-15 Revised:2024-06-27 Online:2024-10-15 Published:2024-10-11
  • About author:DUAN Xinran,born in 2001,undergra-duate.His main research interests include computer vision and machine learning.
    HUANG Hua,born in 1975,professor,Ph.D supervisor,is a member of CCF(No.09499D).His main research intere-sts include video processing and computer graphics.
  • Supported by:
    National Natural Science Foundation of China(62306043).

摘要: 课堂是教育教学的核心阵地,对教师在课堂上的教学环节进行过程化监测和评价是提高课堂教学质量的有效途径。然而,现有基于人工的评价模式存在评价效率低下、易干扰课堂教学、主观误差等缺点,难以达到理想的效果。鉴于人工智能技术的快速发展,提出将以人为中心的智能感知与分析技术引入教师的教学过程中,对教师主体进行实时识别与分析。首先,通过人脸检测算法定位教师实时位置并进行位移分析;其次,利用视线估计算法对教师的关注区域进行检测;最后,采用基于骨架点的动作识别和表情识别对教师的动作和表情进行感知与分析。同时,对指标进行量化统计,以更为高效、客观地了解教师的教学特点,从而帮助教师针对性地改善其授课质量。在相同配置环境下的实验结果表明,该系统的各模块在相应任务中的表现较好,符合教学场景下的使用要求。从在真实的教学视频上的测试结果来看,所设计的系统能够较为准确地感知教师的教学状态,为提升授课质量提供建设性意见。

关键词: 教学质量评估, 视频理解, 位移分析, 视线估计, 动作识别, 表情识别

Abstract: The classroom serves as the core battleground for education.Monitoring and evaluating teachers' instructional activities in the classroom is an effective means of improving the quality of teaching.However,existing manual evaluation methods suffer from drawbacks such as low efficiency,potential disruption of classroom dynamics,and subjective errors,making it difficult to achieve satisfactory results.Given the rapid development of artificial intelligence(AI) technology,it is proposed to integrate human-centered intelligent analysis techniques into teachers' instructional processes for real-time recognition and analysis of tea-chers.First,a facial detection algorithm is employed to locate the teacher's position and estimate their movements.Second,a gaze estimation algorithm is utilized to detect the teachers' focal points.Lastly,skeleton-based action recognition and facial expression recognition are employed to perceive and analyze teachers' actions and expressions.Quantitative statistics on the indicators provide a more efficient and objective understanding of teachers' teaching characteristics,so as to help teachers improve their tea-ching quality.As experimented in the same configuration environment,the modules of the system perform well in the correspon-ding tasks and fulfill the requirements in teaching scenarios.From the evaluation results on real-world teaching videos,the system is designed to accurately perceive the teachers' instructional states,providing constructive feedback for enhancing teaching quality.

Key words: Teaching quality assessment, Video understanding, Displacement estimation, Gaze estimation, Action recognition, Facial expression recognition

中图分类号: 

  • TP391
[1]KONG Y,FU Y.Human action recognition and prediction:Asurvey[J].International Journal of Computer Vision,2022,130(5):1366-1401.
[2]LI S,DENG W H.Deep facial expression recognition:A survey[J].IEEE Transactions on Affective Computing,2020,13(3):1195-1215.
[3]DENG J K,GUO J,VERVERAS E,et al.Retinaface:Single-shot multi-level face localisation in the wild[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2020:5203-5212.
[4]FU D R,ZHANG H M.Educational information processing[M].Beijing:Beijing Normal University Press,2011.
[5]COHEN R B E G.Analyzing teaching behavior[J].American Educational Research Journal,1970,8(3):589-592.
[6]SUN F Q,DENG C X.Study of Emotion Evaluation in Classroom Learning Based on Artificial Intelligence [J].Chinese Journal of ICT in Education,2019(23):58-62.
[7]CHENG Y H,WU R.Student Facial Expression RecognitionMethod Based on Residual Neural Network [J].China Compu-ter & Communication,2018(33):45-47.
[8]HAN L,LI Y,ZHOU Z J,et al.Teaching effect analysis basedon the facial expression recognition in classroom [J].Modern Distance Education Research,2017(4):97-103.
[9]JIA L Y,ZHANG C H,ZHAO X Y,et al.Analysis of students status in class based on artificial intelligence and video proces-sing [J].Modern Educational Technology,2019,29(12):82-88.
[10]ZHONG M C,ZHANG J L,YANG L B,et al.Study on Online Education Focus Degree Based on Face Detection and Fuzzy Comprehensive Evaluation [J].Computer Science,2020,47(S2):196-203.
[11]TAN B,YANG S H.Research on the Algorithm of Students' Classroom Behavior Detection Based on Faster R-CNN [J].Modern Computer,2018(33):45-47.
[12]GAO Y.Analysis of Classroom Teaching Behavior based onSpace-time Map Convolution Network [J].Journal of Xinjiang Normal University,2023,42(1):89-96.
[13]SHI J Y.The Impact of AI-Enabled Teaching Gestures onClassroom Engagement:A Study [J].Computer Knowledge and Technology,2024,20(3):19-21.
[14]SCHROFF F,KALENICHENKO D,PHILBIN J.Facenet:Aunified embedding for face recognition and clustering[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:815-823.
[15]DENG J K,GUO J,XUE N N,et al.Arcface:Additive angular margin loss for deep face recognition[C]//IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.IEEE,2019:4690-4699.
[16]CAO K D,RONG Y,LI C,et al.Pose-robust face recognition via deep residual equivariant mapping[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2018:5187-5196.
[17]QI D L,TAN W J,YAO Q,et al.YOLO5Face:Why reinventing a face detector[C]//European Conference on Computer Vision.IEEE,2022:228-244.
[18]VERVERAS E,GKAGKOS P,DENG J K,et al.3DGazeNet:Generalizing Gaze Estimation with Weak-Supervision from Synthetic Views[J].arXiv:2212.02997,2022.
[19]ABDELRAHMAN A A,HEMPEL T,KHALIFA A,et al.L2cs-net:Fine-grained gaze estimation in unconstrained environments[C]//2023 8th International Conference on Frontiers of Signal Processing(ICFSP).IEEE,2023:98-102.
[20]SUN K,XIAO B,LIU D,et al.Deep high-resolution representation learning for human pose estimation[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2019:5693-5703.
[21]YANG Z,ZENG A,YUAN C,et al.Effective whole-body pose estimation with two-stages distillation[C]//Proceedings of the IEEE International Conference on Computer Vision.2023:4210-4220.
[22]CHENG K,ZHANG Y F,HE X Y,et al.Skeleton-based action recognition with shift graph convolutional network[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2020:183-192.
[23]SONG Y F,ZHANG Z,SHAN C,et al.Constructing stronger and faster baselines for skeleton-based action recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(2):1474-1488.
[24]LEE J,LEE M,LEE D,et al.Hierarchically decomposed graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.2023:10444-10453.
[25]PHAM L,VU T H,TRAN T A.Facial expression recognition using residual masking network[C]//International Conference on Pattern Recognition.IEEE,2021:4513-4519.
[26]ZHAO Z Q,LIU Q S,WANG S M.Learning deep global multi-scale and local attention features for facial expression recognition in the wild[J].IEEE Transactions on Image Processing,2021,30:6544-6556.
[27]WANG K,PENG X J,YANG J F,et al.Suppressing uncertainties for large-scale facial expression recognition[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2020:6897-6906.
[28]SHAHROUDY A,LIU J,NG T T,et al.Ntu rgb+ d:A large scale dataset for 3d human activity analysis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:1010-1019.
[29]LIU J,SHAHROUDY A,PEREZ M,et al.Ntu rgb+ d 120:A large-scale benchmark for 3d human activity understanding[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,42(10):2684-2701.
[30]JIANG P Y,ERGU D,LIU F Y,et al.A Review of Yolo algorithm developments[J].Procedia Computer Science,2022,199:1066-1073.
[31]DU Y H,ZHAO Z C,SONG Y,et al.Strongsort:Make deepsort great again[J].IEEE Transactions on Multimedia,2023,25:8725-8737.
[32]YAN S,XIONG Y,LIN D.Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//AAAI Conference on Artificial Intelligence.2018:7444-7452.
[33]EPANECHNIKOV V A.Non-parametric estimation of a multivariate probability density[J].Theory of Probability & Its Applications,1969,14(1):153-158.
[34]ANG S,LUO P,LOY C C,et al.Wider face:A face detection benchmark[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:5525-5533.
[35]VIOLA P,JONES M.Rapid object detection using a boostedcascade of simple features[C]//Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR2001).2001.
[36]LI J,WANG Y,WANG C,et al.Dsfd:dual shot face detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:5060-5069.
[37]ZHANG K,ZHANG Z,LI Z,et al.Joint face detection andalignment using multitask cascaded convolutional networks[J].IEEE Signal Processing Letters,2016,23(10):1499-1503.
[38]KELLNHOFER P,RECASENS A,STENT S,et al.Gaze360:Physically unconstrained gaze estimation in the wild[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:6912-6921.
[39]GOODFELLOW I J,ERHAN D,CARRIER P L,et al.Challenges in representation learning:A report on three machine lear-ning contests[C]//Neural Information Processing:20th International Conference(ICONIP 2013).2013:117-124.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!