计算机科学 ›› 2019, Vol. 46 ›› Issue (3): 234-241.doi: 10.11896/j.issn.1002-137X.2019.03.035
周明1,3,贾艳明2,周彩兰1,徐宁1,3
ZHOU Ming1,3,JIA Yan-ming2,ZHOU Cai-lan1,XU Ning1,3
摘要: 作文自动评分(Automated Essay Scoring AES)是指使用统计学、自然语言处理及语言学等领域的技术对作文进行评价和评分的系统。篇章结构分析是自然语言处理领域的一个重要研究方向,也是作文自动评分系统的重要组成部分之一。目前国外的作文自动评分系统虽有广泛应用,但对篇章结构评分的研究还存在不足,且对中国学生英语作文的针对性不强;国内对英语作文自动评分的研究处于起步阶段,忽视了篇章结构对英语作文评分的重要性。针对这些问题,提出一种基于篇章结构的英文作文自动评分方法,在词、句、段落3个层面上提取作文的词汇、句法以及结构等特征,并使用支持向量机、随机森林以及极端梯度上升等算法对篇章成分进行分类,最后构建线性回归模型对作文的篇章结构进行评分。实验结果表明,基于随机森林的篇章成分识别模型(Discourse Element Identification based Random Forest,DEI-RF)的准确率为94.13%;基于线性回归的篇章结构自动评分模型(Discourse Structures Scoring based Linear Regression,DSS-LR)在背景介绍段(Introduction)、论证段(Argumentation)以及让步段(Concession)的均方差可达到0.02,0.11和0.08。
中图分类号:
[1]STAB C,GUREVYCH I.Parsing Argumentation Structures in Persuasive Essays[J].Computational Linguistics,2017,43(3):619-659. [2]STAB C,GUREVYCH I.Identifying Argumentative Discourse Structures in Persuasive Essays[C]∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing (EMNLP).2014:46-56. [3]SONG W,FU R,LIU L,et al.Discourse Element Identification in Student Essays based on Global and Local Cohesion[C]∥Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:2255-2261. [4]BURSTEIN J,MARCU D,KNIGHT K.Finding the WRITE Stuff:Automatic Identification of Discourse Structure in Student Essays[J].IEEE Intelligent Systems,2003,18(1):32-39. [5]YIGAL A,JILL B.Automated Essay Scoring with E-rater v.2.0 [J].The Journal of Technology,Learning,and Assessment,2006,4(2):1-21. [6]PALTRIDGE B.Discourse Analysis for the Second Language Writing Classroom∥The TESOL Encyclopedia of English Language Teaching.John Wiley & Sons,Inc.,2017. [7]HSIEH C J,CHANG K W,LIN C J,et al.A dual coordinate descent method for large-scale linear SVM [C]∥International Conference on Machine Learning.Helsinki,Finland:IEEE press,2008:1369-1398. [8]BREIMAN L.Random Forests[J].Machine Learning,2001,45(1):5-32. [9]CHEN T,GUESTRIN C.XGBoost:A Scalable Tree Boosting System[C]∥Acm SIGKDD International Conference on Know-ledge Discovery and Data Mining.ACM,2016:785-794. [10]MANN W.Rhetorical Structure Theory:Toward a Functional Theory of Text Organization[J].Text & Talk,2009,8(3):243-281. [11]DUVERLE D A,PRENDINGER H.A novel discourse parser based on support vector machine classification[C]∥Internatio-nal Joint Conference on Natural Language Processing of the Afnlp.ACL,2010:665-673. [12]FENG V W,HIRST G.A Linear-Time Bottom-Up Discourse Parser with Constraints and Post-Editing[C]∥Proceeding of the 52nd Annual Meeting of the Association for Computational Linguistics.ACL,2014:511-521. [13]YAN W R,XU Y,ZHU S S,et al.A Survey to Discourse Relation Analyzing[J].Journal of Chinese Information Processing,2016,30(4):1-11.(in Chinese) 严为绒,徐扬,朱珊珊,等.篇章关系分析研究综述[J].中文信息学报,2016,30(4):1-11. [14]LI S,KONG F,ZHOU G D.A PDTB-Based Automatic Explicit Discourse Parser[J].Journal of Chinese Information Processing,2016,30(2):18-25.(in Chinese) 李生,孔芳,周国栋.基于PDTB的自动显式篇章分析器[J].中文信息学报,2016,30(2):18-25. [15]XU F,ZHU Q M,ZHOU G D.Implicit discourse relation recognition based on tree kernel[J].Chinese Journal of Software,2013,24(5):1022-1035.(in Chinese) 徐凡,朱巧明,周国栋.基于树核的隐式篇章关系识别[J].软件学报,2013,24(5):1022-1035. [16]JIANG Y R,SONG R.Topic clause identification method based on specific features[J].Journal of Computer Applications,2014,34(5):1345-1349.(in Chinese) 蒋玉茹,宋柔.基于细粒度特征的话题句识别方法[J].计算机应用,2014,34(5):1345-1349. [17]BIRAN O,RAMBOW O.Identifying Justifications in Written Dialogs[J].International Journal of Semantic Computing,2011,5(4):363-381. [18]XING Y K,MA S P.A Survey on Statistical language Models[J].Computer Science,2003,30(9):22-26.(in Chinese) 邢永康,马少平.统计语言模型综述[J].计算机科学,2003,30(9):22-26. [19]PRASAD R,MILTSAKAKI E,DINESH N,et al.The penn discourse treebank 2.0 annotation manual[J].Proceedings of Lrec,2007,24(1):2961-2968. [20]PALAU R M,MOENS M F.Argumentation mining:the detection,classification and structure of arguments in text[C]∥International Conference on Artificial Intelligence and Law.ACM,2009:98-107. [21]周志华.机器学习[M].北京:清华大学出版社,2016. |
[1] | 吕由, 吴文渊. 隐私保护线性回归方案与应用 Privacy-preserving Linear Regression Scheme and Its Application 计算机科学, 2022, 49(9): 318-325. https://doi.org/10.11896/jsjkx.220300190 |
[2] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[3] | 高振卓, 王志海, 刘海洋. 嵌入典型时间序列特征的随机Shapelet森林算法 Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features 计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226 |
[4] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[5] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[6] | 李小伟, 舒辉, 光焱, 翟懿, 杨资集. 自然语言处理在简历分析中的应用研究综述 Survey of the Application of Natural Language Processing for Resume Analysis 计算机科学, 2022, 49(6A): 66-73. https://doi.org/10.11896/jsjkx.210600134 |
[7] | 阙华坤, 冯小峰, 刘盼龙, 郭文翀, 李健, 曾伟良, 范竞敏. Grassberger熵随机森林在窃电行为检测的应用 Application of Grassberger Entropy Random Forest to Power-stealing Behavior Detection 计算机科学, 2022, 49(6A): 790-794. https://doi.org/10.11896/jsjkx.210800032 |
[8] | 王文强, 贾星星, 李朋. 自适应的集成定序算法 Adaptive Ensemble Ordering Algorithm 计算机科学, 2022, 49(6A): 242-246. https://doi.org/10.11896/jsjkx.210200108 |
[9] | 章晓庆, 方建生, 肖尊杰, 陈浜, RisaHIGASHITA, 陈婉, 袁进, 刘江. 基于眼前节相干光断层扫描成像的核性白内障分类算法 Classification Algorithm of Nuclear Cataract Based on Anterior Segment Coherence Tomography Image 计算机科学, 2022, 49(3): 204-210. https://doi.org/10.11896/jsjkx.201100085 |
[10] | 张虎, 柏萍. 融入句子中远距离词语依赖的图卷积短文本分类方法 Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification 计算机科学, 2022, 49(2): 279-284. https://doi.org/10.11896/jsjkx.201200062 |
[11] | 陈志毅, 隋杰. 基于DeepFM和卷积神经网络的集成式多模态谣言检测方法 DeepFM and Convolutional Neural Networks Ensembles for Multimodal Rumor Detection 计算机科学, 2022, 49(1): 101-107. https://doi.org/10.11896/jsjkx.201200007 |
[12] | 刘振宇, 宋晓莹. 一种可用于分类型属性数据的多变量回归森林 Multivariate Regression Forest for Categorical Attribute Data 计算机科学, 2022, 49(1): 108-114. https://doi.org/10.11896/jsjkx.201200189 |
[13] | 陈长伟, 周晓峰. 快速局部协同表示分类器及其在人脸识别中的应用 Fast Local Collaborative Representation Based Classifier and Its Applications in Face Recognition 计算机科学, 2021, 48(9): 208-215. https://doi.org/10.11896/jsjkx.200800155 |
[14] | 王立梅, 朱旭光, 汪德嘉, 张勇, 邢春晓. 基于深度学习的民事案件判决结果分类方法研究 Study on Judicial Data Classification Method Based on Natural Language Processing Technologies 计算机科学, 2021, 48(8): 80-85. https://doi.org/10.11896/jsjkx.210300130 |
[15] | 杨小琴, 刘国军, 郭建慧, 马文涛. 基于随机森林的空域-频域联合特征全参考彩色图像质量评价方法 Full Reference Color Image Quality Assessment Method Based on Spatial and Frequency Domain Joint Features with Random Forest 计算机科学, 2021, 48(8): 99-105. https://doi.org/10.11896/jsjkx.200700106 |
|