Computer Science ›› 2026, Vol. 53 ›› Issue (5): 30-40.doi: 10.11896/jsjkx.250600132

• Intelligent Education Technology • Previous Articles     Next Articles

Application Advantages,Cases and Practical Challenges of Multimodal Technology in the Field of Education

LI Mengge1,2,3, WANG Gang1,2, BAI Wenhao4, LEI Xue3   

  1. 1 School of Information, Xi’an University of Finance and Economics, Xi’an 710100, China
    2 Intellectual Property Collaborative Trustworthy Computing Key Laboratory of Shaanxi Province Universities, Xi’an 710100, China
    3 School of Computer Science, Shaanxi Normal University, Xi’an 710119, China
    4 BYD Automobile Co., Ltd., Xi’an 710100, China
  • Received:2025-06-20 Revised:2025-09-09 Published:2026-05-08
  • About author:LI Mengge,born in 1997,Ph.D,lectu-rer.Her main research interests include intelligent educational technology and mobile crowdsensing.
    GANG Wang,born in 1974,Ph.D,professor.His main research interests include cloud computing,big data,IoTs,trust security and service computing.
  • Supported by:
    National Natural Science Foundation of China(62377031).

Abstract: Education is the cornerstone of national development and national rejuvenation.However,the traditional education model has many limitations in teaching methods,resource allocation and evaluation systems,such as monotonous teaching me-thods,unbalanced educational resources and one-sided evaluation methods.With the rapid development of artificial intelligence technology,multimodal technology,as an emerging technology that integrates multiple data forms(such as images,sounds,and texts),provides new possibilities for solving these problems.Through applications such as smart classrooms and personalized learning systems,multimodal technology can comprehensively perceive and understand the learning environment,thereby breaking the limitations of the traditional education model,enhancing the learning experience,promoting educational equity and achieving personalized learning evaluation.Firstly,this paper outlines the definition,connotation and core algorithms of multimodal techno-logy,and explores its development trajectory and important position in the field of artificial intelligence.Secondly,this paper analyzes the application advantages of multimodal technology in the field of education in detail from multiple dimensions,and conducts in-depth discussions based on specific cases.Finally,this paper discusses the challenges faced by multimodal technology in educational applications,such as data privacy,technical costs and ethical issues.Through in-depth research on the applications and challenges of multimodal technology in education,this paper aims to provide theoretical basis and practical guidance for educatio-nal innovation,and promote the development of education towards a more intelligent,personalized and equitable direction.

Key words: Multimodal technology, Educational innovation, Application advantages, Challenges

CLC Number: 

  • TP391
[1]WANG J H,WANG C Y,TENG J,et al.Challenges,break-throughs,and solutions in building a strong education system[J].China Educational Technology,2025(4):1-12.
[2]JU H M,FANG Y Y,LIU Z S,et al.Path analysis of the impact of technology on education:a perspective from the discrimination of related concepts from ‘educational informatization’ to ‘educational digital transformation’[J].China Educational Techno-logy,2025(4):48-56.
[3]GUO S,ZHENG Y,ZHAI X.Artificial intelligence in education research during 2013-2023:a review based on bibliometric analysis[J].Education and Information Technologies,2024,29(13):16387-16409.
[4]FU X,YUE J,FAIZAN M,et al.SHMT:an SRAM and HBM hybrid computing-in-memory architecture with optimized KV cache for multimodal transformer[J].IEEE Transactions on Circuits and Systems I:Regular Papers,2025,72(6):2712-2725.
[5]ZHANG Z C,WANG J,ZHANG Y,et al.OrthoGPT:a multimodal orthopedic large model for precision diagnosis and treatment[J].Chinese Journal of Intelligent Science and Technology,2024,6(3):338-346.
[6]GUO W M.History of educational change from a technological dimension:new era and new paradigm in educational research[J].Distance Education in China,2025,45(2):54-70.
[7]WANG Y Y,WU G Z,ZHENG Y H.Generative artificial intelligence empowering educational information science and techno-logy research:new opportunities,new trends,and new issues[J].Modern Distance Education Research,2024,36(6):46-54.
[8]SUN L J,CAO M M,ZHANG Y C.Construction and path ana-lysis of a metacognition-oriented multimodal teaching model[J].Digital Education,2024,10(6):60-67.
[9]BIE D R,GUO Y R.New trends in the innovative development of higher education in the era of artificial intelligence[J].China Higher Education,2024(Z1):39-44.
[10]GUO S Q,WANG J Y.Educational intelligence:a new path for technology-enabled rural education equity[J].China Educational Technology,2025(2):67-74,83.
[11]WANG M K,CHEN Z Z,SHI Y W,et al.Design and application effect of an intelligent technology-supported multimodal interactive teaching evaluation framework[J].Modern Educational Technology,2024,34(9):91-101.
[12]HE X Y,TIAN S,CUI L,et al.Preprocessing and edge extraction methods for spinal ultrasound images[J].Application Research of Computers,2020,37(S2):297-299,304.
[13]SONG J F,ZHANG W Y,HAN L,et al.A multi-stage intelligent color restoration algorithm for black-and-white images[J].Computer Science,2024,51(5):92-99.
[14]HARRIS Z S.Distributional Structure[J].Word,1954,10(2/3):146-162.
[15]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Proceedings of the Advances in Neural Information Processing Systems.2013:3111-3119.
[16]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation[C]//Proceedings of the Confe-rence on Empirical Methods in Natural Language Processing.2014:1532-1543.
[17]RUMELHART D E,HINTON G E,WILLIAMS R J.Learning representations by back-propagating errors[J].Nature,1986,323(6088):533-536.
[18]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[19]CHO K,VAN MERRIËNBOER B,GULCEHRE C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[J].arXiv:1406.1078,2014.
[20]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of 31st Internaitonal Conference on Neural Information Processing Systems.2017:5998-6008.
[21]DEVLIN J,CHANG M W,LEE K,et al.Bert:pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics.2019:4171-4186.
[22]LECUN Y,BOSER B,DENKER J S,et al.Backpropagation applied to handwritten zip code recognition[J].Neural Computation,1989,1(4):541-551.
[23]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[C]//Proceedings of the Advances in Neural Information Processing Systems.2012:1097-1105.
[24]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[25]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[26]XIE S,GIRSHICK R,DOLLÁR P,et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1492-1500.
[27]ZHU Y,NEWSAM S.Densenet for dense flow[C]//Procee-dings of the IEEE International Conference on Image Proces-sing.2017:790-794.
[28]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020.
[29]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE International Conference on Computer Vision.2021:10012-10022.
[30]CARION N,MASSA F,SYNNEVE G,et al.End-to-end object detection with transformers[C]//Proceedings of the European Conference on Computer Vision.2020:213-229.
[31]ZHANG Y M,SUN J.Cough sound-based COVID-19 diagnosis algorithm using dual-input neural network of dynamic and static features[J].Acta Electronica Sinica,2023,51(1):202-212.
[32]CHOI K,FAZEKAS G,SANDLER M,et al.Convolutional recurrent neural networks for music classification[C]//Procee-dings of the IEEE International Conference on Acoustics,Speech and Signal Processing.2017:2392-2396.
[33]JIANG L Y,JU J H,XU J,et al.Lightweight music score recognition method based on improved CRNN[J].Acta Electronica Sinica,2023,51(11):3167-3175.
[34]ZHANG H C,LI L X,LIU D J.A review of multimodal data fusion research[J].Journal of Computer Science and Exploration,2024,18(10):2501-2520.
[35]DUAN Z T,HUANG J C,ZHU X L.Research survey on key technologies of video multimodal emotion analysis[J].Journal of Computer Science and Exploration,2025,19(3):539-558.
[36]LIU H C,SONG L J.A review of feature fusion techniques in multimodal MRI brain tumor segmentation methods[J].Computer Engineering and Applications,2024,60(23):28-48.
[37]WANG F X,MAO C L,YU Z T,et al.Fusion of dual attention mechanisms for Burmese image text recognition[J].Journal of Chinese Information Processing,2025,39(1):47-55.
[38]YUAN F Y,MEI H Y,WEN M W,et al.Feature fusion session recommendation method based on enhanced graph neural networks[J].Computer Engineering and Design,2025,46(2):546-553.
[39]LI M,ZHUANG X,BAI L,et al.Multimodal graph learningbased on 3D haar semi-tight framelet for student engagement prediction[J].Information Fusion,2024,105:102224.
[40]RADFORD A,KIM J W,HALLACY C,et al.Learning transferable visual models from natural language supervision[C]//Proceedings of the International Conference on Machine Lear-ning.2021:8748-8763.
[41]RAMESH A,PAVLOV M,GOH G,et al.Zero-shot text-to-image generation[C]//Proceedings of the International Conference on Machine Learning.2021:8821-8831.
[42]LAURENÇON H,TRONCHON L,CORD M,et al.What matters when building vision-language models?[J].arXiv:2405.02246,2024.
[43]WEI R,QI X M,HE Y T,et al.Multimodal MRI disease prognosis combining knowledge distillation and mutual information[J].Journal of Image and Graphics,2025,30(4):1170-1182.
[44]ZHANG X Y,GUO J L,LI J,et al.Interpretable multimodalperception for intelligent driving based on information theory[J].Science China:Information Sciences,2024,54(6):1419-1440.
[45]LU X L,LI Z H.An Internet of Things(IoT) Device Identification Method Integrating Multi-Modal IoT Device Fingerprints and Ensemble Learning[J].Computer Science,2024,51(9):371-382.
[46]CHANG S,FENG Y.Blockchain smart contract vulnerabilitydetection method based on multimodal deep learning[J].Journal of Chinese Computer Systems,2025,46(4):958-965.
[47]SU X H,MIAO Q G,CHEN W Y.Personalized teaching model for improving programming ability based on AI empowerment and industry-education integration[J].China University Tea-ching,2023(6):4-9.
[48]WANG S,WANG F,ZHU Z,et al.Artificial intelligence in education:a systematic literature review[J].Expert Systems with Applications,2024,252:124167.
[49]SUN L J,CAO M M,ZHANG Y C.Construction and path ana-lysis of a metacognition-oriented multimodal teaching model[J].Digital Education,2024,10(6):60-67.
[50]Songshu Ai Li Haoyang:How will AI education reshape the future of learning?[EB/OL].(2024-10-15)[2025-02-16].https://baijiahao.baidu.com/s?id=1813884533695963123&wfr=spider&for=pc.
[51]CAI S,JIAO X Y,YANG Y,et al.Practice of multimodal smart classroom in 5G environment[J].Modern Distance Education Research,2021,33(5):103-112.
[52]CHEN D,ZHANG R.Building multimodal knowledge baseswith multimodal computational sequences and generative adversarial networks[J].IEEE Transactions on Multimedia,2023,26:2027-2040.
[53]LI M,ZHOU S,CHEN Y,et al.EduCross:Dual adversarial bipartite hypergraph learning for cross-modal retrieval in multimodal educational slides[J].Information Fusion,2024,109:102428.
[1] CUI Jinjia, ZENG Chen, WANG Lu, PENG Xiaohui. Analysis of Data Trading Models and Transaction Challenges [J]. Computer Science, 2026, 53(4): 121-133.
[2] FU Juan. Research on Application of Deep Learning-based Natural Language Processing Technology inIntelligent Translation Systems [J]. Computer Science, 2025, 52(11A): 241000037-6.
[3] LI Hui, LI Xiu-hua, XIONG Qing-yu, WEN Jun-hao, CHENG Lu-xi, XING Bin. Edge Computing Enabling Industrial Internet:Architecture,Applications and Challenges [J]. Computer Science, 2021, 48(1): 1-10.
[4] REN Ming, TANG Hong-bo, SI Xue-ming and YOU Wei. Survey of Applications Based on Blockchain in Government Department [J]. Computer Science, 2018, 45(2): 1-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!