采用半随机特征采样算法的中文书写纹识别研究

Computer Science ›› 2013, Vol. 40 ›› Issue (2): 120-123.

Research of Chinese Writeprint Recognition Using Semi-random Feature Sampling Algorithm

Online:2018-11-16 Published:2018-11-16

Abstract

Abstract: Character N-gram can be used to effectively capture individual-author stylistic information in texts. To deal with the problems of high-sparsity and high-redundancy in the feature space, an ensemble classification algorithm based on semi-random feature sampling was proposed in this study. Firstly, the whole feature space is divided into several individual-author feature sets by a divergence rule. Then each of them is divided into equally sized subspaces by a semi-random selection method, and a base classifier is trained on each random subspace. Finally, these base classifiers arc combined to construct an ensemble via the majority voting method. To examine the algorithm, the experiment was conducted on a real-life dataset. It is observes that the algorithm achieved a considerable improvement in accuracy and robustness compared with the benchmark technique in Chinese writeprint identification (random subspace method, bagging and support vector machine).

Key words: Writeprint, Semi-random feature sampling, Individual feature set, Ensemble classifier, Diversity