计算机科学 ›› 2026, Vol. 53 ›› Issue (2): 67-77.doi: 10.11896/jsjkx.250300026
卓铁农1, 英迪2, 赵晖2
ZHUO Tienong1, YING Di2, ZHAO Hui2
摘要: 随着智慧教育的不断发展,学校可以通过检测学生课堂的专注度对学生的学习情况与教师的教学质量进行评估,从而优化教学体系。以往的研究多侧重于单模态、单角色的特征提取,但教学课堂是一个多模态、多角色且角色之间相互影响的复杂场景,因此从多模态多角色角度去探讨学生课堂的专注度具有重大意义。然而,多模态之间如何有效建模时间相关性与语义交互性,以及多角色之间如何相互影响是实现学生课堂专注度评判的重大挑战。针对以上问题,构建了一个包含教师音频和学生视频的学生课堂专注度数据集,并提出了基于多模态多角色的长短时上下文学生课堂专注度评估模型(Long-Short Context Model,LSCM)。其中多模态是指学生的视频与教师的音频,多角色是指学生与学生、学生与教师。该模型主要包含长时上下文模块和短时上下文模块两个模块。长时上下文模块通过音频自注意机制和视觉自注意机制提取单一学生的长时行为特征,并利用视听交叉注意机制增强音频与视觉信息的关联性;短时上下文模块则聚焦于局部时间片段,以刻画课堂环境中多个学生专注度的动态变化。最后,模型输出视频中各个学生的专注度类别。实验表明,该方法通过有效挖掘多模态数据的互补性及角色间的关联性,使专注度检测准确率较现有方法显著提高,验证了多模态融合与角色交互建模的有效性。
中图分类号:
| [1]ZHONG M C,ZHANG J L,LAN Y B,et al.Study on OnlineEducation Focus Degree Based on Face Detection and Fuzzy Comprehensive Evaluation[J].Computer Science,2020,47(S2):196-203. [2]ZALETELJ J,KOSIR A.Predicting Students’ Attention in the Classroom from Kinect Facial and Body Features[J].EURASIP Journal on Image and Video Processing,2017,2017:80. [3]DUAN J L.Evaluation and Evaluation System of Students’ Attentiveness Based on Machine Vision[D].Hangzhou:Zhejiang Gongshang University,2018. [4]ZUO G C,WANG H D,CHEN L S,et al.Evaluation of Modern Apprenticeship Learning Effect Based on Face Recognition Technology[J].Intelligent Computer and Applications,2019,9(2):116-118. [5]HE X L,GAO Q,LI Y Y,et al.Spontaneous Learning FacialExpression Recognition Based on Deep Learning[J].Computer Applications and Software,2019,36(3):180-186. [6]WANG Y K,SUN Y J,PU D B,et al.Multi modal based online learning focus evaluation[J].Journal of Changchun Normal University,2024,43(8):59-66. [7]SINATRA G M,HEDDY B C,LOMBARDI D.The challenges of defining and measuring student engagement in science[J].Educational psychologist,2015,50(1):1-13. [8]TYLER R W.Basic Principles of Curriculum and Instruction[M].Chicago:University of Chicago Press,1949:1-128. [9]PACE C R.Measuring the Quality of Student Effort[J].Current Issues in Higher Education,1980,2(3):10-16. [10]NSSE.Nsse:Evidence-based improvement in highereducation[EB/OL].https://nsse.indiana.edu/nsse/about-nsse/index.html. [11]KAUR A,MUSTAFA A,MEHTA L,et al.Prediction and localization of student engagement in the wild[C]//2018 Digital Image Computing:Techniques and Applications(DICTA).2018:1-8. [12]MOHAMAD N O,DRAS M,HAMEY L,et al.Automatic recognition of student engagementusing deep learning and facial expression[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Springer,2019:273-289. [13]BATRA S,WANG H,NAG A,et al.Dmcnet:Diversified model combination network for understanding engagement from video screengrabs[J].Systems and Soft Computing,2022,4:200039. [14]WHITEHILL J,SERPELL Z,LIN Y C,et al.The faces of engagement:Automatic recognition of student engagementfrom facial expressions[J].IEEE Transactions on Affective Computing,2014,5(1):86-98. [15]SUKUMARAN A,MANOHARAN A.Multimodal engagement recognition from image traits using deep learning techniques[J].IEEE Access,2024,12:25228-25244. [16]SANTONI M M,BASARUDDIN T,JUNNS K,et al.Automatic detection of students’engagement during online learning:A bagging ensemble deep learning approach[J].IEEE Access,2024,12:96063-96073. [17]CHEN Y,ZHOU J,GAO Q,et al.Mdnn:Predicting student engagement via gaze direction and facial expression in collaborative learning[J].Computer Modeling in Engineering & Sciences,2023,136(1):381-401. [18]BUONO P,DE C B,D’ERRICO F,et al.Assessing student en-gagement from eacialbehavior in on-line learning[J].Multimedia Tools and Applications,2023,82(9):12859-12877. [19]IKRAM S,AHMAD H,MAHMOOD N,et al.Recognition ofstudent engagement state in a classroom environment using deep and efficient transfer learning algorithm[J].Applied Sciences,2023,13(15):8637. [20]LAI S,WU F T.Recognition of Learning Concentration Based on Multimodal Physiological Signals[J].Modern Educational Technology,2023,33(6):101-108. [21]DENG F Q,ZHONG J M,LI N N,et al.Text-guided Graph Temporal Modeling for few-shot video classification[J].Engineering Applications of Artificial Intelligence,2024,137:109076. [22]ABEDI A,KHAN S S.Improving state-of-the-art in detectingstudent engagement with resnet and tcn hybrid network[C]//2021 18th Conference on Robots and Vision(CRV).2021:151-157. [23]DAS R,DEV S.Enhancing frame-level student engagement classification through knowledge trans fer techniques[J].Applied Intelligence,2024,54(2):2261-2276. [24]HERNANDEZ J,LIU Z,HULTEN G,et al.Measuring the engagement level of tv viewers[C]//2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Re-cognition(FG).IEEE,2013:1-7. [25]GUPTA A,D’CUNHA A,AWASTHI K,et al.Daisee:To-wards user engagement recognition in the wild[J].arXiv:1609.01885,2016. [26]ZHU X,LYU S,WANG X,et al.Tph-yolov5:Improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:2778-2788. [27]TAND D,BOURDEV L,FERGUS R,et al,Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:4489-4497. [28]DONAHUE J,ANNE H L,GUADARRAMA S,et al.Long-term recurrent convolutional networks for visual recognition and description[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:2625-2634. [29]QIU Z,YAO T,MEI T.Learning spatio-temporal representation with pseudo-3d residual networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:5533-5541. [30]XU H,DAS A,SAENKO K.R-c3d:Region convolutional 3dnetwork for temporal activity detection[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:5783-5792. [31]ABEDI A,KHAN S S.Improving state-of-the-art in detectingstudent engagement with resnet and tcn hybrid network[C]//2021 18th Conference on Robots and Vision(CRV).IEEE,2021:151-157. [32]NEIMARK D,BAR O,ZOHAR M,et al.Video transformernetwork[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:3163-3172. [33]LI Y,WU C Y,FAN H,et al.Mvitv2:Improved multiscale vision transformers for classification and detection[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:4804-4814. [34]LIU Z,NING J,CAO Y,et al.Video swin transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:3202-3211. [35]YOUSAF K,NAWAZ T,HABIB A.Using two-stream effi-cientnet-bilstm network for multiclass classification of disturbing youtube videos[J].Multimedia Tools and Applications,2024,83(12):36519-36546. [36]XIAO F,LEEY J,GRAUMAN K,et al.Audiovisual slowfast networks for video recognition[J].arXiv:2001.08740,2020. |
|
||