Chinese Real-word Error Automatic Proofreading Based on Combining of Local Context Features

LIU Liang-liang and CAO Cun-gen   

  Online:2018-12-01 Published:2018-12-01

Abstract: Similar to the English context-sensitive spelling correction,real-word error in Chinese refers to the error that a Chinese word is misused to another Chinese Word.In the paper,a Chinese real word error detection and correction method based on confusion sets was proposed.This method extracts local feature around the aim word which forms left adjacent bigram,right adjacent bigram and three trigrams.The probability of bigram and trigram are computed with the confusion words in the aim word’s confusion set.A model based on multi-feature fusion was proposed and rules was used to find the real-word errors.We classified the result into two types,marking the errors and rewriting the errors.In the experiment,we used 18 group confusion sets and built 20000 sentences corpus to validate the algorithm.The results show that the proposed method can find the real-word errors in Chinese texts and give the correction lists.The proposed method combines automatic error-detecting and automatic error-correction.

Key words: Real-word error,Confusion set,Context feature,NGram model

Full text



