计算机科学 ›› 2009, Vol. 36 ›› Issue (11): 196-199.

• 人工智能 • 上一篇    下一篇

基于优化的文档频和Beam搜索的特征选择方法

朱颢东,钟勇   

  1. (中国科学院成都计算机应用研究所 成都610041);(中国科学院研究生院 北京100039)
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受四川省科技计划项目(2008GZ0003},四川省科技厅科技攻关项目(07GG006-O14)资助。

Feature Selection Method Based on Optimized Document Frequency and Beam Search

ZHU Hao-dong,ZHONG Yong   

  • Online:2018-11-16 Published:2018-11-16

摘要: 在文本分类中,特征空间的维数通常高达几万,甚至远远超出训练样本的个数,这是一种十分普通现象。为了提高文本挖掘算法的运行速度,降低占用的内存空间,过滤掉不相关或相关程度低的特征,必须使用特征选择算法。首先给出了一个基于最小词频的文档频方法,然后把粗糙集引入进来并提出了一个基于13cam搜索的属性约简算法,最后把该属性约简算法同基于最小词频的文档频方法结合起来,提出了一个综合的特征选择算法。该算法首先利用基于最小词频的文档频方法进行特征选择,然后利用所提属性约简算法消除冗余,从而获得较具代表性的特征子集。实验结果表明该算法是有效的。

关键词: 词频,文档频,粗糙集,Beam搜索,属性约简

Abstract: In text categorization, one problem is usually confronted with feature spaces containing 10, 000 dimensions and more, even exceeding the number of available training samples. In order to enhance the operating speed and reduce the memory space occupied and filter out irrelevant or lower degree of features, feature selection algorithms must be used. In order to obtain more representative feature subset, it firstly presented document frequency method based on minimum word frequency, and then introduced rough sets and presented an algorithm of attribute reduction based on Beam scarch,finally, combined the attribute reduction algorithm with document frequency method based on minimum word frequency and proposed a comprehensive feature selection algorithm. The comprehensive algorithm firstly uses document frequency method based on minimum word frectuency to select feature, and then use the attribute reduction algorithm to eliminate redundancy, so can acquire the feature subset which arc more representative. Experimental results show that the comprehensive algorithm is effective.

Key words: Word frequency,Document frequency,Rough set,Beam search,Attribute reduction

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!