计算机科学 ›› 2011, Vol. 38 ›› Issue (6): 242-245.

• 人工智能 • 上一篇    下一篇

基于集成学习与遗传算法的网络书写纹识别研究

孙建文,杨宗凯,刘三女牙,王 佩   

  1. (华中师范大学国家数字化学习工程技术研究中心 武汉430079); (武汉大学信息管理学院 武汉430072)
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家863计划项目(2008AA1Z131),华中师范大学中央高校基本科研业务费项目(CCNU 09A02006)资助。

Research of Online Writeprint Identification Based on Ensemble Learning and Genetic Algorithm

SUN Jian-wen,YANU Zong-tra, LIU San-ya,WAND Pei   

  • Online:2018-11-16 Published:2018-11-16

摘要: N-gram字符是网络书写纹识别最有效的特征类型之一。针对其特征维数高、冗余特征多且无关特征少等特点,提出一种基于特征空间划分来构造集成学习分类器的网络书写纹识别方法。该方法首先根据一定的划分粒度,将初始特征集划分为等维度、无交又的特征子集,然后基于每一个特征子集训练生成对应的基分类器(多元朴素贝叶斯),最后采用算术与几何平均相结合的融合策略完成集成学习分类器的构造。特征空间的划分(即特征子集的选择)采用遗传算法进行优化。实验在一个真实数据集上开展,其结果表明该方法有效地提高了网络书写纹的识别性能。

关键词: 网络书写纹,集成学习,遗传算法,特征子集

Abstract: Online writeprint identification is a technique to identify individuals based on textual identity cues people leave behind online messages. Character N-gram is one of the most effective approaches to identify writeprint according to previous research. "ho deal with the high dimensional and redundant feature problems and the property of each feature being valuable for the task of writeprint identification, an ensemble learning approach based on feature subspacing was proposed in this study. The essence of this method is to partition the features into distinct subsets. Firstly, the whole feature set is split into equally sized and disjoint subsets. Then each of them is used to train a base classifier using Multinomial Naive Bayes. Finally, these individual classifiers arc aggregated to construct the ensemble via an appropriate combination rule which is a simple average of arithmetic mean and geometric mean. Additionally, genetic algorithm was used to optimize the feature subspacing (i. e. feature subsets selection). To examine the approach, experiment was conducted on a real world test bed. Performance results showed the proposed approach was quite effective and obtained a considerable improvement in accuracy compared with the benchmark technique in writeprint identification (Support Vector Machine).

Key words: Writeprint identification, Ensemble learning, Genetic algorithm, Feature subset

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!