Computer Science ›› 2026, Vol. 53 ›› Issue (6): 69-76.doi: 10.11896/jsjkx.250600189

• Intelligent Education Technology • Previous Articles     Next Articles

From Recognition to Generation:Natural Language Expression of Student Attention in OnlineLearning Contexts

XIE Congcong, AN Yuxuan, WANG Di, LUO Xuemei, WANG Yifeng   

  1. School of Computer Science and Technology,Xidian University,Xi'an 710126,China
  • Received:2025-06-26 Revised:2025-09-30 Online:2026-06-15 Published:2026-06-09
  • About author:XIE Congcong,born in 2000,postgra-duate.His main research interests include machine learning and computer vision.
    WANG Di,born in 1989,Ph.D,professor,is a member of CCF(No.65462S).Her main research interests include machine learning and multimedia information retrieval.
  • Supported by:
    National Science and Technology Major Project(2022ZD0117103),National Natural Science Foundation of China(62577041) and Fundamental Research Funds for the Central Universities (QTZX23084).

Abstract: The continuous advancement of artificial intelligence technologies has accelerated the intelligent transformation of education,with student behavior analysis emerging as a key research area supporting precision teaching and personalized learning.However,existing approaches often rely on specialized models for behavior feature extraction and classification,with outputs typically presented as abstract labels,lacking interpretability and intuitiveness.To enable natural language descriptions of students' attentiveness in online learning scenarios,this paper constructs a vision-language alignment dataset for online education contexts,consisting of student learning images paired with corresponding attention-related behavior descriptions.The dataset includes both single-frame images and multi-frame image sequences.Building upon this dataset,it proposes a multimodal fine-tuning method tailored for the task of describing students' attentive behaviors.Experiments are conducted on the Qwen2.5-VL-3B and Qwen2.5-VL-7B vision-language models.The proposed method incorporates prompt design based on head pose,gaze direction,and facial expressions to guide the model in learning attention-related features.Furthermore,this paper introduces an attention-perception loss to enhance the model's understanding of student behavior.Experimental results demonstrate that the fine-tuned models achieve superior accuracy in describing student attentiveness compared to existing vision-language models.

Key words: Visual language model, Student behavior analysis, Online learning, Engagement access

CLC Number: 

  • TP183
[1]APICELLA A,ARPAIA P,FROSOLONE M,et al.EEG-based measurement system for monitoring student engagement in learning 4.0 [J].Scientific Reports,2022,12(1):5857.
[2]DOWNING C E,SPEARS J,HOLTZ M.Transforming a course to blended learning for student engagement [J].Education Research International,2014,2014(1):430732.
[3]HU M,LI H.Student engagement in online learning:A review[C]//Proceedings of the 2017 International symposium on educational technology.IEEE,2017.
[4]FREDRICKS J A,BLUMENFELD P C,PARIS A H.School engagement:Potential of the concept,state of the evidence [J].Review of Educational Research,2004,74(1):59-109.
[5]LI S,LAJOIE S P,ZHENG J,et al.Automated detection of cognitive engagement to inform the art of staying engaged in problem-solving [J].Computers & Education,2021,163:104114.
[6]PSALTIS A,APOSTOLAKIS K C,DIMITROPOULOS K,et al.Multimodal student engagement recognition in prosocial games [J].IEEE Transactions on Games,2017,10(3):292-303.
[7]TRAN T T,NAGIRIKANDALAGE P.Insights into enhancing student engagement:A practical application of blended learning [J].The International Journal of Management Education,2025,23(2):101167.
[8]ZALETELJ J,KOŠIR A.Predicting students' attention in the classroom from Kinect facial and body features [J].EURASIP Journal on Image and Video Processing,2017,2017:1-12.
[9]JIA Q,HE J.Student Behavior Recognition in Classroom Based on Deep Learning [J].Applied Sciences,2024,14(17):7981.
[10]MEHTA N K,PRASAD S S,SAURAV S,et al.Three-dimensional DenseNet self-attention neural network for automatic detection of student's engagement [J].Applied Intelligence,2022,52(12):13803-13823.
[11]LI J,LI D,XIONG C,et al.Blip:Bootstrapping language-image pre-training for unified vision-language understanding and generation[C]//Proceedings of the International Conference on Machine Learning.2022.
[12]ACHIAM J,ADLER S,AGARWAL S,et al.Gpt-4 technical report [J].arXiv:23030.8774,2023.
[13]BAI J,BAI S,YANG S,et al.Qwen-vl:A frontier large vision-language model with versatile abilities [J].arXiv:2308.12966,2023.
[14]YU Z,XIE M,GAO J,et al.From Raw Video to Pedagogical Insights:A Unified Framework for Student Behavior Analysis [C]//Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence.2024.
[15]REVERDY J,RUSSELL S O C,DUQUENNE L,et al.Roomreader:A multimodal corpus of online multiparty conversational interactions[C]//Proceedings of the Thirteenth Language Resources and Evaluation Conference.2022.
[16]WANG H,GAO C,FU H,et al.Automated student classroom behaviors' perception and identification using motion sensors [J].Bioengineering,2023,10(2):127.
[17]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-basedlearning applied to document recognition [J].Proceedings of the IEEE,1998,86(11):2278-324.
[18]ELMAN J L.Finding structure in time [J].Cognitive Science,1990,14(2):179-211.
[19]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016.
[20]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition [J].arXiv:1409.1556,2014.
[21]ALAKWAA W,NASSEF M,BADR A.Lung cancer detection and classification with 3D convolutional neural network(3D-CNN)[J].International Journal of Advanced Computer Science and Applications,2017,8(8):409-471.
[22]CHEN T,GUESTRIN C.Xgboost:A scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2016.
[23]ZHENG R,JIANG F,SHEN R.Intelligent student behavior analysis system for real classrooms[C]//Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2020.
[24]PABBA C,KUMAR P.An intelligent system for monitoringstudents' engagement in large classroom teaching through facial expression recognition [J].Expert Systems,2022,39(1):e12839.
[25]MONKARESI H,BOSCH N,CALVO R A,et al.Automateddetection of engagement using video-based estimation of facial expressions and heart rate [J].IEEE Transactions on Affective Computing,2016,8(1):15-28.
[26]ALRUWAIS N M,ZAKARIAH M.Student Recognition and Activity Monitoring in E-Classes Using Deep Learning in Higher Education [J].IEEE Access,2024,12:66110-661128.
[27]HU B,ZHENG L,ZHU J,et al.Teaching plan generation andevaluation with GPT-4:Unleashing the potential of LLM in instructional design [J].IEEE Transactions on Learning Technologies,2024,17:1445-1459.
[28]XU S,WEN H N,PAN H,et al.Classroom Simulacra:Building Contextual Student Generative Agents in Online Education for Learning Behavioral Simulation [J].arXiv:2502.02780,2025.
[29]PATARANUTAPORN P,DANRY V,LEONG J,et al.AI-generated characters for supporting personalized learning and well-being [J].Nature Machine Intelligence,2021,3(12):1013-1022.
[30]MOORE S,NGUYEN H A,BIER N,et al.Assessing the quality of student-generated short answer questions using GPT-3[C]//Proceedings of the European Conference on Technology Enhanced Learning.2022.
[31]HOU R,FÜTTERER T,BÜHLER B,et al.Automated assessment of encouragement and warmth in classrooms leveraging multimodal emotional features and chatgpt[C]//Proceedings of the International Conference on Artificial Intelligence in Education.2024
[32]YU S,ANDROSOV A,YAN H.Exploring the prospects ofmultimodal large language models for Automated Emotion Recognition in education:Insights from Gemini [J].Computers &Education,2025,232:105307.
[33]VINYALS O,TOSHEV A,BENGIO S,et al.Show and tell:A neural image caption generator[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015.
[34]CHATGPT O.Optimizing language models for dialogue [EB/OL].https://kpzhang.github.io/report/ChatGPT-KZ-Feb2023.pdf.
[35]TOUVRON H,LAVRIL T,IZACARD G,et al.Llama:Openand efficient foundation language models [J].arXiv:2302.13971,2023.
[36]ZHU F,LIU Z,NG X Y,et al.MMDocBench:Benchmarking large vision-language models for fine-grained visual document understanding[J].arXiv:2410.21311,2024.
[37]LUO J,YU H,TAN C,et al.Enhanced Qwen-VL 7B model via instruction finetuning on Chinese medical dataset[C]//2024 5th International Conference on Computer Engineering and Application(ICCEA).IEEE,2024:526-530.
[38]SCHMIDT F D,SCHNEIDER F,BIEMANN C,et al.MVL-SIB:A Massively Multilingual Vision-Language Benchmark for Cross-Modal Topical Matching [J].arXiv:2502.12852,2025.
[39]JOSHI S.A Comprehensive Review of Qwen and DeepSeekLLMs:Architecture,Performance and Applications [EB/OL].http://dx.doi.org/10.2139/ssrn.5267655.
[40]SÜMER Ö,GOLDBERG P,D'MELLO S,et al.Multimodal engagement analysis from facial videos in the classroom [J].IEEE Transactions on Affective Computing,2021,14(2):1012-27.
[41]GOLDBERG P,SÜMER Ö,STÜRMER K,et al.Attentive or not? Toward a machine learning approach to assessing students' visible engagement in classroom instruction [J].Educational Psychology Review,2021,33:27-49.
[42]HE Z,HUANG C Y,DING C K,et al.If in a crowdsourced data annotation pipeline,a gpt-4[C]//Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems.2024:1-25.
[43]FUJIMOTO Y,BASHAR K.Automatic classification of multi-attributes from person images using GPT-4 Vision[C]//Proceedings of the 2024 6th International Conference on Image,Video and Signal Processing.2024:207-212.
[44]HOU W,JI Z.Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis.Nature methods [J].Nature Me-thods,2024,21:1462-1465.
[1] XIE Hui, LIANG Dan, YANG Huiting, JIA Chunli, HE Jiangshan, DONG Zexiao, REN Ziqi, JIANG Mingzhe, CHEN Xueli. Research on Adaptive Disciplinary Learning Effectiveness Evaluation Model Driven by PrefrontalEEG [J]. Computer Science, 2026, 53(6): 39-49.
[2] ZHANG Shuai, ZHOU Peng, ZHANG Yanping. Online Capricious Data Stream Learning with Sparse Labels [J]. Computer Science, 2025, 52(6): 139-150.
[3] NING Limiao, WANG Ziming, LIN Zhicheng, PENG Jian, TANG Huajin. Learning Rule with Precise Spike Timing Based on Direct Feedback Alignment [J]. Computer Science, 2025, 52(3): 260-267.
[4] GAO Mengqi, FENG Xiang, YU Huiqun, WANG Mengling. Large-scale Multi-objective Evolutionary Algorithm Based on Online Learning of Sparse Features [J]. Computer Science, 2024, 51(3): 56-62.
[5] HUANG Chunli, LIU Guimei, JIANG Wenjun, LI Kenli, ZHANG Ji, TAK-SHING Peter Yum. Learning Pattern Recognition and Performance Prediction Method Based on Learners'Behavior Evolution [J]. Computer Science, 2024, 51(10): 67-78.
[6] QIN Liang, XIE Liang, CHEN Shengshuang, XU Haijiao. Online Semi-supervised Cross-modal Hashing Based on Anchor Graph Classification [J]. Computer Science, 2023, 50(6): 183-193.
[7] WEI Yan-tao, LUO Jie-lin, HU Mei-jia, LI Wen-hao, YAO Huang. Online Learning Emotion Recognition Based on Videos [J]. Computer Science, 2022, 49(11A): 211000049-6.
[8] LIU Ling-yun, QIAN Hui, XING Hong-jie, DONG Chun-ru, ZHANG Feng. Incremental Classification Model Based on Q-learning Algorithm [J]. Computer Science, 2020, 47(8): 171-177.
[9] KONG Fang, LI Qi-zhi, LI Shuai. Survey on Online Influence Maximization [J]. Computer Science, 2020, 47(5): 7-13.
[10] HE Xiao-wen, HU Yi-fei, WANG Hai-ping, CHEN Mo. Online Learning Nonnegative Matrix Factorization [J]. Computer Science, 2019, 46(6A): 473-477.
[11] WAN Jia-shan, CHEN Lei, WU Jin-hua, GAO Chao. Persona Based Social User Modeling Using KD-Tree [J]. Computer Science, 2019, 46(6A): 442-445.
[12] LI De-quan, DONG Qiao, ZHOU Yue-jin. Distributed Online Conditional Gradient Optimization Algorithm [J]. Computer Science, 2019, 46(3): 332-337.
[13] YANG Hai-min, PAN Zhi-song, BAI Wei. Review of Time Series Prediction Methods [J]. Computer Science, 2019, 46(1): 21-28.
[14] QIN Yi-xiu, WEN Yi-min, HE Qian. Multi-source Online Transfer Learning Algorithm for Classification of Data Streams with Concept Drift [J]. Computer Science, 2019, 46(1): 64-72.
[15] CHEN Jin-yin, FANG Hang, LIN Xiang, ZHENG Hai-bin, YANG Dong-yong, ZHOU Xiao. Personal Learning Recommendation Based on Online Learning Behavior Analysis [J]. Computer Science, 2018, 45(11A): 422-426.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!