Computer Science ›› 2019, Vol. 46 ›› Issue (3): 234-241.doi: 10.11896/j.issn.1002-137X.2019.03.035

• Artificial Intelligence • Previous Articles     Next Articles

English Automated Essay Scoring Methods Based on Discourse Structure

ZHOU Ming1,3,JIA Yan-ming2,ZHOU Cai-lan1,XU Ning1,3   

  1. (School of Computer Science and Technology,Wuhan University of Technology,Wuhan 430070,China)1
    (Research Center for Artificial Intelligence and Big Data,Global Wisdom Inc,Beijing 100085,China)2
    (Hubei Key Laboratory of Transportation Internet of Things,Wuhan University of Technology,Wuhan 430070,China)3
  • Received:2018-01-24 Revised:2018-05-13 Online:2019-03-15 Published:2019-03-22

Abstract: Automated essay scoring is defined as the computer technology that evaluates and scores the composition,based on the technologies of statistics,natural language processing,linguistics and some other fields.Discourse structure analysis is not only an important research field of natural language processing,but also an important component of the AES system.Nowadays,AES system has widely application.However,there is not enough research on the structure of the essay,and the AES system does not focus on the Chinese students.The domestic researches on the AES are in infancy,ignoring the importance of discourse structure in essay scoring.In view of these problems,this paper proposed a method of automated essay scoring based on discourse structure.Firstly,the method extracts essay’s features,such as vocabulary,lexical and discourse structure from levels of words,sentences and paragraphs.Then,the composition of essays is classified by support vector machines,random forests and extreme gradient boosting,and then the linear regression model with the discourse element is constructed to score the compositions.The experimental results show that the accuracy of discourse element identification based random forest (DEI-RF) can reach 94.13%,and the mean squared error of automated discourse structure scoring based on linear regression (DSS-LR) model can reach 0.02,0.11 and 0.08 on introduction,argumentation and concession respectively.

Key words: Automated essay scoring, Discourse element, Discourse structure analysis, Natural language processing, Random forest, Linear regression

第3期周 明, 等:基于篇章结构的英文作文自动评分方法

CLC Number: 

  • TP391.1
[1] STAB C,GUREVYCH I.Parsing Argumentation Structures in Persuasive Essays[J].Computational Linguistics,2017,43(3):619-659.
[2] STAB C,GUREVYCH I.Identifying Argumentative DiscourseStructures in Persuasive Essays[C]∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing (EMNLP).2014:46-56.
[3] SONG W,FU R,LIU L,et al.Discourse Element Identification in Student Essays based on Global and Local Cohesion[C]∥Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:2255-2261.
[4] BURSTEIN J,MARCU D,KNIGHT K.Finding the WRITEStuff:Automatic Identification of Discourse Structure in Student Essays[J].IEEE Intelligent Systems,2003,18(1):32-39.
[5] YIGAL A,JILL B.Automated Essay Scoring with E-raterv.2.0 [J].The Journal of Technology,Learning,and Assessment,2006,4(2):1-21.
[6] PALTRIDGE B.Discourse Analysis for the Second LanguageWriting Classroom∥The TESOL Encyclopedia of English Language Teaching.John Wiley & Sons,Inc.,2017.
[7] HSIEH C J,CHANG K W,LIN C J,et al.A dual coordinate descent method for large-scale linear SVM [C]∥International Conference on Machine Learning.Helsinki,Finland:IEEE press,2008:1369-1398.
[8] BREIMAN L.Random Forests[J].Machine Learning,2001,45(1):5-32.
[9] CHEN T,GUESTRIN C.XGBoost:A Scalable Tree Boosting System[C]∥Acm SIGKDD International Conference on Know-ledge Discovery and Data Mining.ACM,2016:785-794.
[10] MANN W.Rhetorical Structure Theory:Toward a Functional Theory of Text Organization[J].Text & Talk,2009,8(3):243-281.
[11] DUVERLE D A,PRENDINGER H.A novel discourse parser based on support vector machine classification[C]∥Internatio-nal Joint Conference on Natural Language Processing of the Afnlp.ACL,2010:665-673.
[12] FENG V W,HIRST G.A Linear-Time Bottom-Up DiscourseParser with Constraints and Post-Editing[C]∥Proceeding of the 52nd Annual Meeting of the Association for Computational Linguistics.ACL,2014:511-521.
[13] YAN W R,XU Y,ZHU S S,et al.A Survey to Discourse Relation Analyzing[J].Journal of Chinese Information Processing,2016,30(4):1-11.(in Chinese)严为绒,徐扬,朱珊珊,等.篇章关系分析研究综述[J].中文信息学报,2016,30(4):1-11.
[14] LI S,KONG F,ZHOU G D.A PDTB-Based Automatic Explicit Discourse Parser[J].Journal of Chinese Information Processing,2016,30(2):18-25.(in Chinese)李生,孔芳,周国栋.基于PDTB的自动显式篇章分析器[J].中文信息学报,2016,30(2):18-25.
[15] XU F,ZHU Q M,ZHOU G D.Implicit discourse relation recognition based on tree kernel[J].Chinese Journal of Software,2013,24(5):1022-1035.(in Chinese)徐凡,朱巧明,周国栋.基于树核的隐式篇章关系识别[J].软件学报,2013,24(5):1022-1035.
[16] JIANG Y R,SONG R.Topic clause identification method based on specific features[J].Journal of Computer Applications,2014,34(5):1345-1349.(in Chinese)蒋玉茹,宋柔.基于细粒度特征的话题句识别方法[J].计算机应用,2014,34(5):1345-1349.
[17] BIRAN O,RAMBOW O.Identifying Justifications in WrittenDialogs[J].International Journal of Semantic Computing,2011,5(4):363-381.
[18] XING Y K,MA S P.A Survey on Statistical language Models[J].Computer Science,2003,30(9):22-26.(in Chinese)邢永康,马少平.统计语言模型综述[J].计算机科学,2003,30(9):22-26.
[19] PRASAD R,MILTSAKAKI E,DINESH N,et al.The penn discourse treebank 2.0 annotation manual[J].Proceedings of Lrec,2007,24(1):2961-2968.
[20] PALAU R M,MOENS M F.Argumentation mining:the detection,classification and structure of arguments in text[C]∥International Conference on Artificial Intelligence and Law.ACM,2009:98-107.
[21] 周志华.机器学习[M].北京:清华大学出版社,2016.
[1] TONG Xin, WANG Bin-jun, WANG Run-zheng, PAN Xiao-qin. Survey on Adversarial Sample of Deep Learning Towards Natural Language Processing [J]. Computer Science, 2021, 48(1): 258-267.
[2] LU Long-long, CHEN Tong, PAN Min-xue, ZHANG Tian. CodeSearcher:Code Query Using Functional Descriptions in Natural Languages [J]. Computer Science, 2020, 47(9): 1-9.
[3] TIAN Ye, SHOU Li-dan, CHEN Ke, LUO Xin-yuan, CHEN Gang. Natural Language Interface for Databases with Content-based Table Column Embeddings [J]. Computer Science, 2020, 47(9): 60-66.
[4] LIU Zhen-peng, SU Nan, QIN Yi-wen, LU Jia-huan, LI Xiao-fei. FS-CRF:Outlier Detection Model Based on Feature Segmentation and Cascaded Random Forest [J]. Computer Science, 2020, 47(8): 185-188.
[5] YANG Wei-chao, GUO Yuan-bo, LI Tao, ZHU Ben-quan. Method Based on Traffic Fingerprint for IoT Device Identification and IoT Security Model [J]. Computer Science, 2020, 47(7): 299-306.
[6] ZHANG Ying, ZHANG Yi-fei, WANG Zhong-qing and WANG Hong-ling. Automatic Summarization Method Based on Primary and Secondary Relation Feature [J]. Computer Science, 2020, 47(6A): 6-11.
[7] ZHANG Hao-yang and ZHOU Liang. Application of Improved GHSOM Algorithm in Civil Aviation Regulation Knowledge Map Construction [J]. Computer Science, 2020, 47(6A): 429-435.
[8] WU Xiao-kun, ZHAO Tian-fang. Application of Natural Language Processing in Social Communication:A Review and Future Perspectives [J]. Computer Science, 2020, 47(6): 184-193.
[9] HU Chao-wen, YANG Ya-lian, WU Chang-xing. Survey of Implicit Discourse Relation Recognition Based on Deep Learning [J]. Computer Science, 2020, 47(4): 157-163.
[10] YU Shan-shan, SU Jin-dian, LI Peng-fei. Sentiment Classification Method for Sentences via Self-attention [J]. Computer Science, 2020, 47(4): 204-210.
[11] LI Zhou-jun,FAN Yu,WU Xian-jie. Survey of Natural Language Processing Pre-training Techniques [J]. Computer Science, 2020, 47(3): 162-173.
[12] MIAO Yi, ZHAO Zeng-shun, YANG Yu-lu, XU Ning, YANG Hao-ran, SUN Qian. Survey of Image Captioning Methods [J]. Computer Science, 2020, 47(12): 149-160.
[13] HUO Dan, ZHANG Sheng-jie, WAN Lu-jun. Context-based Emotional Word Vector Hybrid Model [J]. Computer Science, 2020, 47(11A): 28-34.
[14] ZHAO Rui-jie, SHI Yong, ZHANG Han, LONG Jun, XUE Zhi. Webshell File Detection Method Based on TF-IDF [J]. Computer Science, 2020, 47(11A): 363-367.
[15] WANG Xiao-hui, ZHANG Liang, LI Jun-qing, SUN Yu-cui, TIAN Jie, HAN Rui-yi. Study on XGBoost Improved Method Based on Genetic Algorithm and Random Forest [J]. Computer Science, 2020, 47(11A): 454-458.
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[2] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[3] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[4] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[5] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[6] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[7] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[8] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[9] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .
[10] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .