计算机科学 ›› 2017, Vol. 44 ›› Issue (Z11): 102-105.doi: 10.11896/j.issn.1002-137X.2017.11A.020

• 智能计算 • 上一篇    下一篇

基于改进的LDA模型的中文主观题自动评分研究

罗海蛟,柯晓华   

  1. 广东外语外贸大学思科信息学院 广州510006广东外语外贸大学语言工程与计算实验室 广州510006,广东外语外贸大学思科信息学院 广州510006广东外语外贸大学语言工程与计算实验室 广州510006
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受广东省自然科学基金资助

Automated Scoring Chinese Subjective Responses Based on Improved-LDA

LUO Hai-jiao and KE Xiao-hua   

  • Online:2018-12-01 Published:2018-12-01

摘要: 主观题自动评分(Automated Scoring Subjective Responses,ASSR)在语言学习与语言测试领域的诊断信息及信度方面具有重要的应用前景。将主题模型中的隐含狄利克雷分配(Latent Dirichlet Allocation,LDA)引入到中文主观题自动评分中,提出了一种结合专家知识的改进的LDA模型,并采用了一种综合文档-隐含主题概率向量及隐含主题-核心词项概率向量的文本特征表示方式。实验对比了改进的LDA与潜在语义分析(Latent Semantic Analysis,LSA)的自动评分效果,结果表明改进的LDA模型在中文主观题自动评分中切实有效。

关键词: 主观题自动评分,潜在语义分析 (LSA),隐含狄利克雷分配(LDA),绝对一致性,相邻一致性

Abstract: Automated scoring subjective responses (ASSR) have great promise for providing diagnostic information and reliability to aid language learning and testing.In the presnt study,we introduced the latent Dirichlet allocation (LDA) into an automated scoring task with Chinese subjective responses,and an improved LDA model with experts’ know-ledge was proposed.In the novel model,we proposed a text feature representation approach integrating document-latent topic probability vector and latent topic-core terms probability vector.Experiment results show that the improved-LDA is better than LSA in terms of the autoscoring performances.The findings of this study highlight the model selection in application of automated scoring Chinese responses with language testing.

Key words: Automated subjective question scoring,Latent semantic analysis (LSA),Latent Dirichlet allocation (LDA),Absolute accuracy rate,Adjacent accuracy rate

[1] 徐昌火,陈东,吴倩,等.汉语作为第二语言作文自动评分研究初探[J].国际汉语教学研究,2015(1):83-89.
[2] DIKLI S.An overview of automated scoring of essays[J].Journal of Technology,Learning,and Assessmen,2006(1):3-35.
[3] QUELLMALZ,PELLEGRINO E S,W J.Technology and testing[J].Science,2009(2):75-79.
[4] 梁茂成.中国学生英语作文自动评分模型的构建[M].北京:外语教学与研究出版社,2011.
[5] 任春艳.HSK作文评分客观化探讨[J].汉语学习,2004(6):58-67.
[6] XIAHUA K,YONGQIANG Z,MA Q,et al.Complex dynamics of text analysis[J].Physica A:Statistical Mechanics and its Applications,2009,5C:307-314.
[7] LANDAUER T K.Automatic essay assessment[J].Assessment in Education:Principles,Policy and Practice,2003(3):295-309.
[8] LANDAUER,K T,DUMAIS,et al.A solution to Plato’s problem:The latent semantic analysis theory of acquisition,induction,and representation of knowledge[J].Psychological Review,1997,4(2):211-240.
[9] 桂诗春.潜伏语义分析的理论及其应用[J].现代外语,2003(1):76-84.
[10] HOFMANN,THOMAS.Unsupervised Learning by Probabilistic Latent Semantic Analysis[J].Machine Learning,2001,2(1/2):177-196.
[11] BLEI D M,NG A Y,JORDAN M I.Latent Dirichlet Allocation[J].The Journal of Machine Learning Research,2003,3:993-1022.
[12] TEH Y W,JORDAN M I,BEAL M J,et al.Hierarchical Diri-chlet Processes[J].Journal of the American Statistical Association,2006,1(476):1566-1581.
[13] MEI Q,ZHAI C.A note on EM algorithm for probabilistic latent semantic analysis[C]∥Proceedings of the International Conference on Information and Knowledge Management,CIKM.2001.
[14] PORTEOUS I,NEWMAN D,IHLER A,et al.Fast collapsed gibbs sampling for latent dirichlet allocation[C]∥Proceedings of the 14th ACM SIGKDD International Conference on Knowle-dge Discovery and Data Mining.ACM,2008.
[15] KE X H,LUO H J.Using LSA and PLSA for Text Quality Analysis[C]∥International Conference on Electronic Science and Automation Control.Atlantis Press,2015.
[16] 曹娟,张勇东,李锦涛,等.一种基于密度的自适应最优LDA模型选择方法[J].计算机学报,2008,1(10).
[17] 哈工大社会计算与信息检索研究中心-同义词词林[EB/OL].http://www.ltp-cloud.com/download.
[18] SHERMIS M D,BURSTEIN J C.Automated essay scoring:A cross-disciplinary perspective[M].Routledge,2003.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!