计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230900073-6.doi: 10.11896/jsjkx.230900073

• 交叉&应用 • 上一篇    下一篇

基于CRF的中文语法错误诊断系统的实现与应用

李斌1, 王浩畅2   

  1. 1 湖南科技大学计算机科学与工程学院 湖南 湘潭 411201
    2 东北石油大学计算机与信息技术学院 黑龙江 大庆 163318
  • 发布日期:2024-06-06
  • 通讯作者: 李斌(libin31209@163.com)
  • 基金资助:
    国家自然科学基金(61402099);黑龙江省自然科学基金(LH2021F004)

Implementation and Application of Chinese Grammatical Error Diagnosis System Based on CRF

LI Bin1, WANG Haochang2   

  1. 1 School of Computer Science and Engineering,Hunan University of Science and Technology,Xiangtan,Hunan 411201,China
    2 School of Computer and Information Technology,Northeast Petroleum University,Daqing,Heilongjiang 163318,China
  • Published:2024-06-06
  • About author:LI Bin,born in 1993,master,lecturer.His main research interests include na-tural language processing and intelligence education.
  • Supported by:
    National Natural Science Foundation of China(61402099) and Natural Science Foundation of Heilongjiang Pro-vince of China(LH2021F004).

摘要: 随着中国国际影响力的提高和汉语国际地位的提升,将中文作为第二语言学习的外国人数量逐年增加,中文已成为世界上最为流行的语言之一。基于此,中文语法错误诊断的研究备受关注。首先,从中文语法错误诊断的定义出发,总结目前的研究现状。其次,通过对各种中文语法错误诊断方法的分析,构建了基于条件随机场的中文语法错误诊断系统,探究中文语法自动检错系统及其具体应用流程,以帮助中文学习者提高学习效率。在CGED2016数据集上的实验结果表明,该系统在检测层和识别层上的性能较好,在位置层上还需要改进。

关键词: 中文语法错误诊断, 序列标注, 条件随机场, 自然语言处理

Abstract: With the improvement of China’s international influence and the worldwide status of Chinese,the number of foreigners who learn Chinese as a second language increases year by year,and Chinese has become one of the most popular languages in the world.Based on this,the research of Chinese grammatical error diagnosis has attracted much attention.This paper first summarizes the current research status from the definition of Chinese grammatical error diagnosis.Secondly,through the analysis of various Chinese grammatical error diagnosis methods,a Chinese grammatical error diagnosis system based on conditional random field (CRF) is constructed to explore the Chinese grammar automatic error detection system and its specific application process,so as to assist Chinese learners in improving their learning efficiency.Experimental results on the CGED2016 dataset show that the system performs well in the detection and identification levels and needs to be improved in the position level.

Key words: Chinese grammatical error diagnosis, Sequence annotation, Conditional random field, Natural language processing

中图分类号: 

  • TP391.1
[1]YU L C,LEE L H,CHANG L P.Overview of grammatical error diagnosis for learning Chinese as a foreign language[C]//Proceedings of the 1st Workshop on Natural Language Processing Techniques for Educational Applications.Nara,Japan:Asia Pacific Society for Computers in Education,2014:42-47.
[2]CHENG S M,YU C H,CHEN H H.Chinese Word Ordering Errors Detection and Correction for Non-Native Chinese Language Learners[C]//Proceedings of COLING 2014,the 25th International Conference on Computational Linguistics:Technical Papers.2014:279-289.
[3]LAFFERTY J,MCCALLUM A,PEREIRA F.Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data[C]//Proceedings of the 18th International Conference on Machine Learning.2001:282-289.
[4]YEH J F,YEH C K,YU K H,et al.Condition Random Fields-based Grammatical Error Detection for Chinese as Second Language[C]//Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications.Beijing,China:Association for Computational Linguistics and Asian Federation of Natural Language Processing,2015:105-110.
[5]ZHENG B,CHE W,GUO J,et al.Chinese Grammatical Error Diagnosis with Long Short-TermMemory Networks[C]//Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications.Osaka,Japan:Natural Language Processing Techniques for Educational Applications,2016:49-56.
[6]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[7]YANG Y,XIE P,TAO J,et al.Alibaba at IJCNLP-2017 Task 1:Embedding Grammatical Features into LSTMs for Chinese Grammatical Error Diagnosis Task[C]//Proceedings of the IJCNLP:Shared Tasks,2017:41-46.
[8]FU K,HUANG J,DUAN Y.Youdao’s Winning solution to the NLPCC-2018 Task 2 challenge:A neural machine translation approach to Chinese grammatical error correction[C]//Proceedings of the CCF International Conference on Natural Language Processing and Chinese Computing.Cham:Springer,2018:341-350.
[9]LI C,ZHOU J,BAO Z,et al.A Hybrid System for ChineseGrammatical Error Diagnosis and Correction[C]//Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications.Stroudsburg,PA,USA:Association for Computational Linguistics,2018:60-69.
[10]FU R,PEI Z,GONG J,et al.Chinese Grammatical Error Diagnosis using Statistical and Prior Knowledge driven Features with Probabilistic Ensemble Enhancement[C]//Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications.Stroudsburg,PA,USA:Association for Computational Linguistics,2018:52-59.
[11]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Minneapolis:ACL,2019:4171-4186.
[12]WANG S,WANG B,GONG J,et al.Combining Resnet andTransformer for Chinese Grammatical Error Diagnosis[C]//Proceedings of the 6th Workshop on Natural Language Proces-sing Techniques for Educational Applications.Suzhou,China:Association for Computational Linguistics,2020:36-43.
[13]LUO Y,BAO Z,LI C,et al.Chinese Grammatical Error Diagnosis with Graph Convolution Network and Multi-task Learning[C]//Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications.Suzhou,China:Association for Computational Linguistics,2020:44-48.
[14]SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequencelearning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems.Cambridge,USA:MIT Press,2014:3104-3112.
[15]CAO Y,HE L,RIDLEY R,et al.Integrating BERT and Score-based Feature Gates for Chinese Grammatical Error Diagnosis[C]//Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications.Suzhou,China:Association for Computational Linguistics,2020:49-56.
[16]CHEN C.Research on the Diagnosis and Repair of Short Text Error Based on Web[D].Wuhan:Central China Normal University,2018.
[17]ZAREMBA W,SUTSKEVER I,VINYALS O.Recurrent Neural Network Regularization[J].arXiv:1409.2329,2014.
[18]ZHAO Y N.The Study of Automatic Function Information Extraction and Classification Approach for Chinese Patent[D].Tianjin:Hebei University of Technology,2016.
[19]WANG S S.Research on Construction Technology of Question Bank Based on Knowledge Graph[D].Chongqing:Chongqing University of Posts and Telecommunications,2021.
[20]LIN Q M,QI Z Z.Research on Speech Emotion RecognitionBased on HMM and ANN Mixed Model[J].Computer Techno-logy and Development,2018,28(10):74-78.
[21]LANG B,FAN Y N.Personalized Learning Behavior Evaluation Method Based on Deep Neural Network[J].Computer Techno-logy and Development,2019,29(7):6-10.
[22]ZHANG Y P.Research on the Construction Technology of Military Equipment Knowledge Graph[D].Xi’an:Xidian University,2021.
[23]HAN W Y.Question Answering Oriented Automatic Chinese Grammatical Error Diagnosis Method[D].Harbin:Harbin Institute of Technology,2015.
[24]LEE L H,RAO G Q,YU L C,et al.Overview of NLP-TEA 2016 Shared Task for Chinese Grammatical Error Diagnosis[C]//Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications.Osaka,Japan:Natural Language Processing Techniques for Educational Applications,2016:40-48.
[25]LI B,WANG H C.Design and Research of Intelligent Foreign Chinese Learning System[J].Computer Technology and Deve-lopment,2022,32(3):15-20.
[26]LI B.Research and Implementation of Teaching Chinese as Fo-reign Language System Based on Chatbot[D].Daqing:Northeast Petroleum University,2019.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!