Computer Science ›› 2013, Vol. 40 ›› Issue (2): 120-123.

Previous Articles     Next Articles

Research of Chinese Writeprint Recognition Using Semi-random Feature Sampling Algorithm

  

  • Online:2018-11-16 Published:2018-11-16

Abstract: Character N-gram can be used to effectively capture individual-author stylistic information in texts. To deal with the problems of high-sparsity and high-redundancy in the feature space, an ensemble classification algorithm based on semi-random feature sampling was proposed in this study. Firstly, the whole feature space is divided into several individual-author feature sets by a divergence rule. Then each of them is divided into equally sized subspaces by a semi-random selection method, and a base classifier is trained on each random subspace. Finally, these base classifiers arc combined to construct an ensemble via the majority voting method. To examine the algorithm, the experiment was conducted on a real-life dataset. It is observes that the algorithm achieved a considerable improvement in accuracy and robustness compared with the benchmark technique in Chinese writeprint identification (random subspace method, bagging and support vector machine).

Key words: Writeprint, Semi-random feature sampling, Individual feature set, Ensemble classifier, Diversity

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!