Computer Science ›› 2026, Vol. 53 ›› Issue (5): 90-98.doi: 10.11896/jsjkx.250600183

• Intelligent Education Technology • Previous Articles     Next Articles

Innovative Automated Scoring Based on Large Language Models

WANG Shenghui, LI Teng   

  1. School of Artificial Intelligence, Anhui University, Hefei 230601, China
  • Received:2025-06-26 Revised:2025-08-28 Published:2026-05-08
  • About author:WANG Shenghui,born in 2000,postgraduate.His main research interests include the application of large language models and computer vision.
    LI Teng,born in 1980,Ph.D,professor,Ph.D supervisor.His main research interests include computer vision and pattern recognition.

Abstract: Innovative automated scoring(IAS) is crucial in education.Traditional scoring is subjective,inefficient,and lacks uniform standards.The fast progress of large language models offers new solutions.This study creates a high-quality dataset WAIS and presents a semantic-driven hierarchical topic extraction algorithm.Through four phases-semantic chunking,basic topic extraction,optimized analysis,and topic fusion-the algorithm improves the model’s ability to extract themes from student answers,enabling automatic topic extraction.It offers a solid basis for automated scoring and establishes an explainable cognitive framework for subsequent scoring.The study compares three prompting strategies:Zero-shot,Few-shot,and Chain-of-Thought(CoT),and evaluates them using several pre-trained models.Results show CoT is superior.The DeepSeek-R1 model achieves 68% accuracy.After fine-tuning,the smaller-parameter model Qwen1.5-7B reaches 83% accuracy,even slightly surpassing the larger-parameter model using only the prompt in innovative scoring tasks.This indicates that using large language models for innovative automated scoring is feasible and has great potential for development.

Key words: Large language models, Innovation, Automated scoring, Prompt engineering, Supervised fine-tuning

CLC Number: 

  • TP391
[1]DASGUPTA T,DEY L.Automatic Scoring for Innovativeness of Textual Ideas[C]//AAAI Workshop:Knowledge Extraction from Text.2016.
[2]LIU A,FENG B,XUE B,et al.DeepSeek-v3 technical report[J].arXiv:2412.19437,2024.
[3]MIN B,ROSS H,SULEM E,et al.Recent advances in natural language processing via large pre-trained language models:A survey[J].ACM Computing Surveys,2023,56(2):1-40.
[4]ACHIAM J,ADLER S,AGARWAL S,et al.Gpt-4 technical report[J].arXiv:2303.08774,2023.
[5]CHOWDHARY K R.Natural language processing[M]//Fundamentals of Artificial Intelligence.New Delhi:Springer India,2020:603-649.
[6]MAHOWALD K,IVANOVA A A,BLANK I A,et al.Dissociating language and thought in large language models[J].Trends in Cognitive Sciences,2024,28(6):517-540.
[7]KLINE S J,ROSENBERG N.An overview of innovation[M]//Studies on Science and the Innovation Process:Selected Works of Nathan Rosenberg.New Jersey:World Scientific,2010:173-203.
[8]PIMENTEL M A F,CLIFTON D A,CLIFTON L,et al.A review of novelty detection[J].Signal Processing,2014,99:215-249.
[9]SILVIA P J,WINTERSTEIN B P,WILLSE J T,et al.Assessing creativity with divergent thinking tasks:exploring the reliability and validity of new subjective scoring methods[J].Psychology of Aesthetics,Creativity,and the Arts,2008,2(2):68.
[10]TORRANCE E P.Torrance tests of creative thinking[J].Educational and Psychological Measurement,1966,26(2):223-232.
[11]BENEDEK M,MÜHLMANN C,JAUK E,et al.Assessment of divergent thinking by means of the subjective top-scoring method:Effects of the number of top-ideas and time-on-task on reliability and validity[J].Psychology of Aesthetics,Creativity,and the Arts,2013,7(4):341.
[12]CROPLEY A J.Defining and measuring creativity:Are creativity tests worth using?[J].Roeper Review,2000,23(2):72-79.
[13]ELAZAR Y,KASSNER N,RAVFOGEL S,et al.Measuringand improving consistency in pretrained language models[J].Transactions of the Association for Computational Linguistics,2021,9:1012-1031.
[14]GUO J.Web-based creativity assessment system that collects both verbal and figural responses:Its problems and potentials[J].International Journal of Information and Education Technology,2019,9(1):27-34.
[15]DASGUPTA I,LAMPINEN A K,CHAN S C Y,et al.Language models show human-like content effects on reasoning tasks[J].arXiv:2207.07051,2022.
[16]ORGANISCIAK P,ACAR S,DUMAS D,et al.Beyond semantic distance:Automated scoring of divergent thinking greatly improves with large language models[J].Thinking Skills and Creativity,2023,49:101356.
[17]LEE Y.Systematic homonym detection and replacement based on contextual word embedding[J].Neural Processing Letters,2021,53(1):17-36.
[18]MCNAMEE P,DUH K.An extensive exploration of back-translation in 60 languages[C]//Findings of the Association for Computational Linguistics:ACL 2023.2023:8166-8183.
[19]WHITE J,FU Q,HAYS S,et al.A prompt pattern catalog to enhance prompt engineering with chatgpt[J].arXiv:2302.11382,2023.
[20]XU L,XIE H,QIN S Z J,et al.Parameter-efficient fine-tuning methods for pretrained language models:A critical review and assessment[J].arXiv:2312.12148,2023.
[21]POURPANAH F,ABDAR M,LUO Y,et al.A review of generalized zero-shot learning methods[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(4):4051-4070.
[22]WEI J,WANG X,SCHUURMANS D,et al.Chain-of-thoughtprompting elicits reasoning in large language models[J].Advances in Neural Information Processing Systems,2022,35:24824-24837.
[23]RENZE M.The effect of sampling temperature on problem solving in large language models[C]//Findings of the Association for Computational Linguistics:EMNLP 2024.2024:7346-7356.
[24]ARORA K,GUPTA N,PATHAK S.Sentimental analysis onimdb movies review using bert[C]//2023 4th International Conference on Electronics and Sustainable Communication Systems(ICESC).IEEE,2023:866-871.
[25]HU E J,SHEN Y,WALLIS P,et al.Lora:Low-rank adaptation of large language models[C]//ICLR.2022.
[26]XIN C,LU Y,LIN H,et al.Beyond full fine-tuning:Harnessing the power of LoRA for multi-taskinstruction tuning[C]//Proceedings of the 2024 Joint International Conference on Computational Linguistics,Language Resources and Evaluation(LREC-COLING 2024).2024:2307-2317.
[27]HODSON T O.Root mean square error(RMSE) or mean absolute error(MAE):When to use them or not[J].Geoscientific Model Development Discussions,2022,2022:1-10.
[28]COHEN I,HUANG Y,CHEN J,et al.Pearson correlation coefficient[M]//Noise Reduction in Speech Processing.Berlin:Springer,2009:1-4.
[29]MCGRAW K O,WONG S P.Forming inferences about some in-traclass correlation coefficients[J].Psychological Methods,1996,1(1):30.
[30]PARTHASARATHY V B,ZAFAR A,KHAN A,et al.The ultimate guide to fine-tuning llms from basics to breakthroughs:An exhaustive review of technologies,research,best practices,applied research challenges and opportunities[J].arXiv:2408.13296,2024.
[1] JI Wendi, WANG Yongquan, SHEN Yicheng. Boosting Generative Rule Extraction via Negative-aware Approach [J]. Computer Science, 2026, 53(5): 276-285.
[2] LI Mengge, WANG Gang, BAI Wenhao, LEI Xue. Application Advantages,Cases and Practical Challenges of Multimodal Technology in the Field of Education [J]. Computer Science, 2026, 53(5): 30-40.
[3] LIU Suyi, LIU Qi, GAO Weibo. Agent4Stu:Efficient LLM-based Student Answer Behavior Simulation Agent [J]. Computer Science, 2026, 53(4): 347-355.
[4] HU Junjie, CHEN Yujie, HU Yikun, WEN Cheng, CAO Jialun, MA Zhi, SU Jie, SUN Weidi, TIAN Cong, QIN Shengchao. Formal Theorem Proving Empowered by Large Language Model:Survey and Perspectives [J]. Computer Science, 2026, 53(4): 1-23.
[5] XU Cheng, LIU Yuxuan, WANG Xin, ZHANG Cheng, YAO Dengfeng, YUAN Jiazheng. Review of Speech Disorder Assessment Methods Driven by Large Language Models [J]. Computer Science, 2026, 53(3): 307-320.
[6] LI Wenli, FENG Xiaonian, QIAN Tieyun. Few-shot Continuous Toxicity Detection Based on Large Language Model Augmentation [J]. Computer Science, 2026, 53(3): 321-330.
[7] LIU Lilong, LIU Guoming, QI Baoyuan, DENG Xueshan, XUE Dizhan, QIAN Shengsheng. Efficient Inference Techniques of Large Models in Real-world Applications:A Comprehensive Survey [J]. Computer Science, 2026, 53(1): 12-28.
[8] SHAO Xinyi, ZHU Jingwei, ZHANG Liang. LLM-based Business Process Adaptation Method to Respond Long-tailed Changes [J]. Computer Science, 2026, 53(1): 29-38.
[9] LI Maolin, LIN Jiajie, YANG Zhenguo. Confidence-guided Prompt Learning for Multimodal Aspect-level Sentiment Analysis [J]. Computer Science, 2025, 52(7): 241-247.
[10] CHEN Jinyin, XI Changkun, ZHENG Haibin, GAO Ming, ZHANG Tianxin. Survey of Security Research on Multimodal Large Language Models [J]. Computer Science, 2025, 52(7): 315-341.
[11] LI Bo, MO Xian. Application of Large Language Models in Recommendation System [J]. Computer Science, 2025, 52(6A): 240400097-7.
[12] HU Caishun. Study on Named Entity Recognition Algorithms in Audit Domain Based on Large LanguageModels [J]. Computer Science, 2025, 52(6A): 240700190-4.
[13] GAO Hongkui, MA Ruixiang, BAO Qihao, XIA Shaojie, QU Chongxiao. Research on Hybrid Retrieval-augmented Dual-tower Model [J]. Computer Science, 2025, 52(6): 324-329.
[14] LI Hao, YANG Yumeng, ZHAO Boyang, ZHENG Puqi, LIN Hongfei. Adverse Drug Reaction Relationship Extraction Based on Chain of Thought Enhancement UnderHigh and Low Resources [J]. Computer Science, 2025, 52(12): 224-230.
[15] HUANG Haixin, XU Chenglong, FU Yao. Research on Structured Pruning Algorithm Based on Information Fusion [J]. Computer Science, 2025, 52(11A): 241000041-6.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!