基于大语言模型的创新性自动评分

doi:10.11896/jsjkx.250600183

Abstract

Abstract: Innovative automated scoring(IAS) is crucial in education.Traditional scoring is subjective,inefficient,and lacks uniform standards.The fast progress of large language models offers new solutions.This study creates a high-quality dataset WAIS and presents a semantic-driven hierarchical topic extraction algorithm.Through four phases-semantic chunking,basic topic extraction,optimized analysis,and topic fusion-the algorithm improves the model’s ability to extract themes from student answers,enabling automatic topic extraction.It offers a solid basis for automated scoring and establishes an explainable cognitive framework for subsequent scoring.The study compares three prompting strategies:Zero-shot,Few-shot,and Chain-of-Thought(CoT),and evaluates them using several pre-trained models.Results show CoT is superior.The DeepSeek-R1 model achieves 68% accuracy.After fine-tuning,the smaller-parameter model Qwen1.5-7B reaches 83% accuracy,even slightly surpassing the larger-parameter model using only the prompt in innovative scoring tasks.This indicates that using large language models for innovative automated scoring is feasible and has great potential for development.

Key words: Large language models, Innovation, Automated scoring, Prompt engineering, Supervised fine-tuning

CLC Number:

TP391

WANG Shenghui, LI Teng. Innovative Automated Scoring Based on Large Language Models[J].Computer Science, 2026, 53(5): 90-98.

References

[1]DASGUPTA T,DEY L.Automatic Scoring for Innovativeness of Textual Ideas[C]//AAAI Workshop:Knowledge Extraction from Text.2016.
[2]LIU A,FENG B,XUE B,et al.DeepSeek-v3 technical report[J].arXiv:2412.19437,2024.
[3]MIN B,ROSS H,SULEM E,et al.Recent advances in natural language processing via large pre-trained language models:A survey[J].ACM Computing Surveys,2023,56(2):1-40.
[4]ACHIAM J,ADLER S,AGARWAL S,et al.Gpt-4 technical report[J].arXiv:2303.08774,2023.
[5]CHOWDHARY K R.Natural language processing[M]//Fundamentals of Artificial Intelligence.New Delhi:Springer India,2020:603-649.
[6]MAHOWALD K,IVANOVA A A,BLANK I A,et al.Dissociating language and thought in large language models[J].Trends in Cognitive Sciences,2024,28(6):517-540.
[7]KLINE S J,ROSENBERG N.An overview of innovation[M]//Studies on Science and the Innovation Process:Selected Works of Nathan Rosenberg.New Jersey:World Scientific,2010:173-203.
[8]PIMENTEL M A F,CLIFTON D A,CLIFTON L,et al.A review of novelty detection[J].Signal Processing,2014,99:215-249.
[9]SILVIA P J,WINTERSTEIN B P,WILLSE J T,et al.Assessing creativity with divergent thinking tasks:exploring the reliability and validity of new subjective scoring methods[J].Psychology of Aesthetics,Creativity,and the Arts,2008,2(2):68.
[10]TORRANCE E P.Torrance tests of creative thinking[J].Educational and Psychological Measurement,1966,26(2):223-232.
[11]BENEDEK M,MÜHLMANN C,JAUK E,et al.Assessment of divergent thinking by means of the subjective top-scoring method:Effects of the number of top-ideas and time-on-task on reliability and validity[J].Psychology of Aesthetics,Creativity,and the Arts,2013,7(4):341.
[12]CROPLEY A J.Defining and measuring creativity:Are creativity tests worth using?[J].Roeper Review,2000,23(2):72-79.
[13]ELAZAR Y,KASSNER N,RAVFOGEL S,et al.Measuringand improving consistency in pretrained language models[J].Transactions of the Association for Computational Linguistics,2021,9:1012-1031.
[14]GUO J.Web-based creativity assessment system that collects both verbal and figural responses:Its problems and potentials[J].International Journal of Information and Education Technology,2019,9(1):27-34.
[15]DASGUPTA I,LAMPINEN A K,CHAN S C Y,et al.Language models show human-like content effects on reasoning tasks[J].arXiv:2207.07051,2022.
[16]ORGANISCIAK P,ACAR S,DUMAS D,et al.Beyond semantic distance:Automated scoring of divergent thinking greatly improves with large language models[J].Thinking Skills and Creativity,2023,49:101356.
[17]LEE Y.Systematic homonym detection and replacement based on contextual word embedding[J].Neural Processing Letters,2021,53(1):17-36.
[18]MCNAMEE P,DUH K.An extensive exploration of back-translation in 60 languages[C]//Findings of the Association for Computational Linguistics:ACL 2023.2023:8166-8183.
[19]WHITE J,FU Q,HAYS S,et al.A prompt pattern catalog to enhance prompt engineering with chatgpt[J].arXiv:2302.11382,2023.
[20]XU L,XIE H,QIN S Z J,et al.Parameter-efficient fine-tuning methods for pretrained language models:A critical review and assessment[J].arXiv:2312.12148,2023.
[21]POURPANAH F,ABDAR M,LUO Y,et al.A review of generalized zero-shot learning methods[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(4):4051-4070.
[22]WEI J,WANG X,SCHUURMANS D,et al.Chain-of-thoughtprompting elicits reasoning in large language models[J].Advances in Neural Information Processing Systems,2022,35:24824-24837.
[23]RENZE M.The effect of sampling temperature on problem solving in large language models[C]//Findings of the Association for Computational Linguistics:EMNLP 2024.2024:7346-7356.
[24]ARORA K,GUPTA N,PATHAK S.Sentimental analysis onimdb movies review using bert[C]//2023 4th International Conference on Electronics and Sustainable Communication Systems(ICESC).IEEE,2023:866-871.
[25]HU E J,SHEN Y,WALLIS P,et al.Lora:Low-rank adaptation of large language models[C]//ICLR.2022.
[26]XIN C,LU Y,LIN H,et al.Beyond full fine-tuning:Harnessing the power of LoRA for multi-taskinstruction tuning[C]//Proceedings of the 2024 Joint International Conference on Computational Linguistics,Language Resources and Evaluation(LREC-COLING 2024).2024:2307-2317.
[27]HODSON T O.Root mean square error(RMSE) or mean absolute error(MAE):When to use them or not[J].Geoscientific Model Development Discussions,2022,2022:1-10.
[28]COHEN I,HUANG Y,CHEN J,et al.Pearson correlation coefficient[M]//Noise Reduction in Speech Processing.Berlin:Springer,2009:1-4.
[29]MCGRAW K O,WONG S P.Forming inferences about some in-traclass correlation coefficients[J].Psychological Methods,1996,1(1):30.
[30]PARTHASARATHY V B,ZAFAR A,KHAN A,et al.The ultimate guide to fine-tuning llms from basics to breakthroughs:An exhaustive review of technologies,research,best practices,applied research challenges and opportunities[J].arXiv:2408.13296,2024.

Related Articles 15

[1]	JI Wendi, WANG Yongquan, SHEN Yicheng. Boosting Generative Rule Extraction via Negative-aware Approach [J]. Computer Science, 2026, 53(5): 276-285.
[2]	LI Mengge, WANG Gang, BAI Wenhao, LEI Xue. Application Advantages,Cases and Practical Challenges of Multimodal Technology in the Field of Education [J]. Computer Science, 2026, 53(5): 30-40.
[3]	LIU Suyi, LIU Qi, GAO Weibo. Agent4Stu:Efficient LLM-based Student Answer Behavior Simulation Agent [J]. Computer Science, 2026, 53(4): 347-355.
[4]	HU Junjie, CHEN Yujie, HU Yikun, WEN Cheng, CAO Jialun, MA Zhi, SU Jie, SUN Weidi, TIAN Cong, QIN Shengchao. Formal Theorem Proving Empowered by Large Language Model:Survey and Perspectives [J]. Computer Science, 2026, 53(4): 1-23.
[5]	XU Cheng, LIU Yuxuan, WANG Xin, ZHANG Cheng, YAO Dengfeng, YUAN Jiazheng. Review of Speech Disorder Assessment Methods Driven by Large Language Models [J]. Computer Science, 2026, 53(3): 307-320.
[6]	LI Wenli, FENG Xiaonian, QIAN Tieyun. Few-shot Continuous Toxicity Detection Based on Large Language Model Augmentation [J]. Computer Science, 2026, 53(3): 321-330.
[7]	LIU Lilong, LIU Guoming, QI Baoyuan, DENG Xueshan, XUE Dizhan, QIAN Shengsheng. Efficient Inference Techniques of Large Models in Real-world Applications:A Comprehensive Survey [J]. Computer Science, 2026, 53(1): 12-28.
[8]	SHAO Xinyi, ZHU Jingwei, ZHANG Liang. LLM-based Business Process Adaptation Method to Respond Long-tailed Changes [J]. Computer Science, 2026, 53(1): 29-38.
[9]	LI Maolin, LIN Jiajie, YANG Zhenguo. Confidence-guided Prompt Learning for Multimodal Aspect-level Sentiment Analysis [J]. Computer Science, 2025, 52(7): 241-247.
[10]	CHEN Jinyin, XI Changkun, ZHENG Haibin, GAO Ming, ZHANG Tianxin. Survey of Security Research on Multimodal Large Language Models [J]. Computer Science, 2025, 52(7): 315-341.
[11]	LI Bo, MO Xian. Application of Large Language Models in Recommendation System [J]. Computer Science, 2025, 52(6A): 240400097-7.
[12]	HU Caishun. Study on Named Entity Recognition Algorithms in Audit Domain Based on Large LanguageModels [J]. Computer Science, 2025, 52(6A): 240700190-4.
[13]	GAO Hongkui, MA Ruixiang, BAO Qihao, XIA Shaojie, QU Chongxiao. Research on Hybrid Retrieval-augmented Dual-tower Model [J]. Computer Science, 2025, 52(6): 324-329.
[14]	LI Hao, YANG Yumeng, ZHAO Boyang, ZHENG Puqi, LIN Hongfei. Adverse Drug Reaction Relationship Extraction Based on Chain of Thought Enhancement UnderHigh and Low Resources [J]. Computer Science, 2025, 52(12): 224-230.
[15]	HUANG Haixin, XU Chenglong, FU Yao. Research on Structured Pruning Algorithm Based on Information Fusion [J]. Computer Science, 2025, 52(11A): 241000041-6.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Innovative Automated Scoring Based on Large Language Models

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0