Computer Science ›› 2012, Vol. 39 ›› Issue (11): 127-130.
Previous Articles Next Articles
Online:
Published:
Abstract: Due to the maldistribution of class and feature, the classification performance of traditional information gain algorithm will decrease sharply. Considering that,a text feature selection method TDpIG based on the information gain was proposed. First of all, selected feature in dataset based on the class,which can reduce the effect of dataset imbalance on feature selection. Secondly, calculated information gain weight by using feature occurrence probability to decrease the interference of low frequency words to feature selection. At last, analysed the increasing information of each class by use of dispersion,filtering out the relative redundant features of high frequency words,further refining the selected feature applied increasing information, and getting the uniform and accurate subsets. The comparison experiment shows that the method has better classification performance.
Key words: Feature selection, Text classification, Information gain, Redundant feature, Imbalanced dataset
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: https://www.jsjkx.com/EN/
https://www.jsjkx.com/EN/Y2012/V39/I11/127
Cited