Computer Science ›› 2013, Vol. 40 ›› Issue (10): 252-256.

Previous Articles     Next Articles

Text Feature Selection Methods Based on Information Gain and Feature Relation Tree

REN Yong-gong,YANG Xue,YANG Rong-jie and HU Zhi-dong   

  • Online:2018-11-16 Published:2018-11-16

Abstract: Due to the maldistribution of classes and features,the classification performance of traditional information gain algorithm will decline sharply.Considering that,a text feature selection method UDsIG was proposed which is based on the information gain.Firstly,because the feature selection may be influenced when the classes is unevenly distributed,we selected features based on class.Secondly,we used feature distribution uniformity to improve the influence on feature selection process when features are uneven distributed in the class.Then we adopt the feature relation tree model to deal with the class features,retain strong correlation features and delete the weak correlation and irrelevant ones.At last,we got the best feature subset by using of information gain formula which is based on weighted dispersion.The comparison experiment shows that the method has better classification performance.

Key words: Feature selection,Feature relation tree,Information gain,Imbalanced dataset,Dispersion

[1] Kao C C.Design of echo cancellation and noise elimination for speech enhancement[J].IEEE Transactions on Consumer Electronics,2003,49
[2] Ng H,Goh W,Low K.Feature selection,perceptron learningand a usability case study for text categorization [C]∥Procee-dings of the 20th ACM International Confer-ence onResearch and Development in InformationRetrieval(SIGIR-97).1997:67-73
[3] Xu Yan,Chen Lin.Term-frequency Based Feature SelectionMethods for Text Categorization[C]∥Proceedings of the 2010Fourth International Conference on Genetic and Evolutionary Computing.Dec.2010
[4] J Xian,L Pei-yu,G Wei,et al.An algorithm application in intrusion forensics based on improved information gain[C]∥3rd Symposium on Web Society(SWS)2011.2011
[5] Wang Zi-qiang,Zhang De-xian.Feature Selection in Text Classification Via SVM and LSI[J].Lecture Notes in Computer Science,2006,1:1381-1386
[6] Yang Yu-zhen,Liu Pei-yu,Zhu Zhen-fang,et al.The Researchof an Improved Information Gain Method Using Distribution Information of Terms[C]∥IEEE International Symposium.2009:938-941
[7] 崔自峰,徐宝文,张卫峰.一种近似Markov Blanket最优特征选择算法[J].计算机学报,2007,0(12):2074-2081
[8] Hu Qing-hua,Yu Da-ren,Xie Zong-xia.Neighborhood classifiers[J].Expert Systems with Applications,2008,4(2):866-876
[9] 刘海峰,王元元,姚泽清.文本分类中一种基于选择的二次特征降维方法[J].情报学报,2009,8(1):23-27
[10] 徐燕,李锦涛,王斌,等.基于区分类别能力的高性能特征选择方法 [J].软件学报,2008,9(1):82-89
[11] 周城,葛斌,唐九阳,等.基于相关性和冗余度的联合特征选择方法[J].计算机科学,2012,9(4):181-184
[12] 刘庆和,梁正友.一种基于信息增益的特征优化选择方法[J].计算机工程与应用,2011,47(12):130-136

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!