Computer Science ›› 2019, Vol. 46 ›› Issue (3): 234-241.doi: 10.11896/j.issn.1002-137X.2019.03.035

• Artificial Intelligence • Previous Articles     Next Articles

English Automated Essay Scoring Methods Based on Discourse Structure

ZHOU Ming1,3,JIA Yan-ming2,ZHOU Cai-lan1,XU Ning1,3   

  1. (School of Computer Science and Technology,Wuhan University of Technology,Wuhan 430070,China)1
    (Research Center for Artificial Intelligence and Big Data,Global Wisdom Inc,Beijing 100085,China)2
    (Hubei Key Laboratory of Transportation Internet of Things,Wuhan University of Technology,Wuhan 430070,China)3
  • Received:2018-01-24 Revised:2018-05-13 Online:2019-03-15 Published:2019-03-22

Abstract: Automated essay scoring is defined as the computer technology that evaluates and scores the composition,based on the technologies of statistics,natural language processing,linguistics and some other fields.Discourse structure analysis is not only an important research field of natural language processing,but also an important component of the AES system.Nowadays,AES system has widely application.However,there is not enough research on the structure of the essay,and the AES system does not focus on the Chinese students.The domestic researches on the AES are in infancy,ignoring the importance of discourse structure in essay scoring.In view of these problems,this paper proposed a method of automated essay scoring based on discourse structure.Firstly,the method extracts essay’s features,such as vocabulary,lexical and discourse structure from levels of words,sentences and paragraphs.Then,the composition of essays is classified by support vector machines,random forests and extreme gradient boosting,and then the linear regression model with the discourse element is constructed to score the compositions.The experimental results show that the accuracy of discourse element identification based random forest (DEI-RF) can reach 94.13%,and the mean squared error of automated discourse structure scoring based on linear regression (DSS-LR) model can reach 0.02,0.11 and 0.08 on introduction,argumentation and concession respectively.

Key words: Automated essay scoring, Discourse element, Discourse structure analysis, Linear regression

第3期周 明, Natural language processing, Random forest, 等:基于篇章结构的英文作文自动评分方法

CLC Number: 

  • TP391.1
[1]STAB C,GUREVYCH I.Parsing Argumentation Structures in Persuasive Essays[J].Computational Linguistics,2017,43(3):619-659.
[2]STAB C,GUREVYCH I.Identifying Argumentative Discourse
Structures in Persuasive Essays[C]∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing (EMNLP).2014:46-56.
[3]SONG W,FU R,LIU L,et al.Discourse Element Identification in Student Essays based on Global and Local Cohesion[C]∥Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:2255-2261.
[4]BURSTEIN J,MARCU D,KNIGHT K.Finding the WRITE
Stuff:Automatic Identification of Discourse Structure in Student Essays[J].IEEE Intelligent Systems,2003,18(1):32-39.
[5]YIGAL A,JILL B.Automated Essay Scoring with E-rater
v.2.0 [J].The Journal of Technology,Learning,and Assessment,2006,4(2):1-21.
[6]PALTRIDGE B.Discourse Analysis for the Second Language
Writing Classroom∥The TESOL Encyclopedia of English Language Teaching.John Wiley & Sons,Inc.,2017.
[7]HSIEH C J,CHANG K W,LIN C J,et al.A dual coordinate descent method for large-scale linear SVM [C]∥International Conference on Machine Learning.Helsinki,Finland:IEEE press,2008:1369-1398.
[8]BREIMAN L.Random Forests[J].Machine Learning,2001,45(1):5-32.
[9]CHEN T,GUESTRIN C.XGBoost:A Scalable Tree Boosting System[C]∥Acm SIGKDD International Conference on Know-ledge Discovery and Data Mining.ACM,2016:785-794.
[10]MANN W.Rhetorical Structure Theory:Toward a Functional Theory of Text Organization[J].Text & Talk,2009,8(3):243-281.
[11]DUVERLE D A,PRENDINGER H.A novel discourse parser based on support vector machine classification[C]∥Internatio-nal Joint Conference on Natural Language Processing of the Afnlp.ACL,2010:665-673.
[12]FENG V W,HIRST G.A Linear-Time Bottom-Up Discourse
Parser with Constraints and Post-Editing[C]∥Proceeding of the 52nd Annual Meeting of the Association for Computational Linguistics.ACL,2014:511-521.
[13]YAN W R,XU Y,ZHU S S,et al.A Survey to Discourse Relation Analyzing[J].Journal of Chinese Information Processing,2016,30(4):1-11.(in Chinese)
严为绒,徐扬,朱珊珊,等.篇章关系分析研究综述[J].中文信息学报,2016,30(4):1-11.
[14]LI S,KONG F,ZHOU G D.A PDTB-Based Automatic Explicit Discourse Parser[J].Journal of Chinese Information Processing,2016,30(2):18-25.(in Chinese)
李生,孔芳,周国栋.基于PDTB的自动显式篇章分析器[J].中文信息学报,2016,30(2):18-25.
[15]XU F,ZHU Q M,ZHOU G D.Implicit discourse relation recognition based on tree kernel[J].Chinese Journal of Software,2013,24(5):1022-1035.(in Chinese)
徐凡,朱巧明,周国栋.基于树核的隐式篇章关系识别[J].软件学报,2013,24(5):1022-1035.
[16]JIANG Y R,SONG R.Topic clause identification method based on specific features[J].Journal of Computer Applications,2014,34(5):1345-1349.(in Chinese)
蒋玉茹,宋柔.基于细粒度特征的话题句识别方法[J].计算机应用,2014,34(5):1345-1349.
[17]BIRAN O,RAMBOW O.Identifying Justifications in Written
Dialogs[J].International Journal of Semantic Computing,2011,5(4):363-381.
[18]XING Y K,MA S P.A Survey on Statistical language Models[J].Computer Science,2003,30(9):22-26.(in Chinese)
邢永康,马少平.统计语言模型综述[J].计算机科学,2003,30(9):22-26.
[19]PRASAD R,MILTSAKAKI E,DINESH N,et al.The penn discourse treebank 2.0 annotation manual[J].Proceedings of Lrec,2007,24(1):2961-2968.
[20]PALAU R M,MOENS M F.Argumentation mining:the detection,classification and structure of arguments in text[C]∥International Conference on Artificial Intelligence and Law.ACM,2009:98-107.
[21]周志华.机器学习[M].北京:清华大学出版社,2016.
[1] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[2] HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[3] GAO Zhen-zhuo, WANG Zhi-hai, LIU Hai-yang. Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features [J]. Computer Science, 2022, 49(7): 40-49.
[4] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[5] LI Xiao-wei, SHU Hui, GUANG Yan, ZHAI Yi, YANG Zi-ji. Survey of the Application of Natural Language Processing for Resume Analysis [J]. Computer Science, 2022, 49(6A): 66-73.
[6] QUE Hua-kun, FENG Xiao-feng, LIU Pan-long, GUO Wen-chong, LI Jian, ZENG Wei-liang, FAN Jing-min. Application of Grassberger Entropy Random Forest to Power-stealing Behavior Detection [J]. Computer Science, 2022, 49(6A): 790-794.
[7] WANG Wen-qiang, JIA Xing-xing, LI Peng. Adaptive Ensemble Ordering Algorithm [J]. Computer Science, 2022, 49(6A): 242-246.
[8] ZHANG Xiao-qing, FANG Jian-sheng, XIAO Zun-jie, CHEN Bang, Risa HIGASHITA, CHEN Wan, YUAN Jin, LIU Jiang. Classification Algorithm of Nuclear Cataract Based on Anterior Segment Coherence Tomography Image [J]. Computer Science, 2022, 49(3): 204-210.
[9] ZHANG Hu, BAI Ping. Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification [J]. Computer Science, 2022, 49(2): 279-284.
[10] CHEN Zhi-yi, SUI Jie. DeepFM and Convolutional Neural Networks Ensembles for Multimodal Rumor Detection [J]. Computer Science, 2022, 49(1): 101-107.
[11] LIU Zhen-yu, SONG Xiao-ying. Multivariate Regression Forest for Categorical Attribute Data [J]. Computer Science, 2022, 49(1): 108-114.
[12] WANG Li-mei, ZHU Xu-guang, WANG De-jia, ZHANG Yong, XING Chun-xiao. Study on Judicial Data Classification Method Based on Natural Language Processing Technologies [J]. Computer Science, 2021, 48(8): 80-85.
[13] YANG Xiao-qin, LIU Guo-jun, GUO Jian-hui, MA Wen-tao. Full Reference Color Image Quality Assessment Method Based on Spatial and Frequency Domain Joint Features with Random Forest [J]. Computer Science, 2021, 48(8): 99-105.
[14] ZHENG Jian-hua, LI Xiao-min, LIU Shuang-yin, LI Di. Improved Random Forest Imbalance Data Classification Algorithm Combining Cascaded Up-sampling and Down-sampling [J]. Computer Science, 2021, 48(7): 145-154.
[15] CAO Yang-chen, ZHU Guo-sheng, QI Xiao-yun, ZOU Jie. Research on Intrusion Detection Classification Based on Random Forest [J]. Computer Science, 2021, 48(6A): 459-463.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!