Computer Science ›› 2016, Vol. 43 ›› Issue (10): 206-210.doi: 10.11896/j.issn.1002-137X.2016.10.039

Previous Articles     Next Articles

TSF Feature Selection Method for Imbalanced Text Sentiment Classification

WANG Jie, LI De-yu and WANG Su-ge   

  • Online:2018-12-01 Published:2018-12-01

Abstract: In the imbalanced datasets,the imbalanced distribution of the samples is often accompanied by the imbalanced distribution of features.The features,which often appear in the majority class,rarely appear in the minority class.According to the characteristics of the imbalanced feature distribution,we proposed a new two-side fisher (TSF) feature selection method.TSF can control combination of positive features and negative features explicitly and tackle the imba-lanced problem in the level of feature.Experiments are conducted on the book reviews and COAE2014 imbalanced dataset.Experimental results indicate that TSF is an effective feature selection method for the imbalanced problem.

Key words: Imbalanced,Text sentiment classification,Positive and negative feature,Two-side feature selection

[1] Lv Yun-yun,Li Yang,Wang Su-ge.A method for chinese opi-nion sentence identification based on the ensemble classifier with bootstrapping[J].Journal of Chinese Information Processing,2013,27(5):84-92(in Chinese) 吕云云,李旸,王素格.基于BootStrapping的集成分类器的中文观点句识别方法[J].中文信息学报,2013,27(5):84-92
[2] Tang Hui-feng,Tan Song-bo,Cheng Xue-qi.Research on sentiment classification of chinese review based on supervised machine learning techniques[J].Journal of Chinese Information Processing,2007,21(6):88-94(in Chinese)唐慧丰,谭松波,程学旗.基于监督学习的中文情感分类技术比较研究[J].中文信息学报,2007,21(6):88-94
[3] Pang B,Lee L,Vaithyanathan S.Thumbs up?:sentiment classification using machine learning techniques[C]∥Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10.Association for Computational Linguistics,2002:79-86
[4] Liu S M,Chen Jun-huan.A multi-label classification based approach for sentiment classification[J].Expert Systems with Applications,2015,42(3):1083-1093
[5] Li Dong,Wei Fu-ru,Liu Shu-jie,et al.A statistical parsing frame-work for sentiment classification[J].Computational Linguistics,2015,4(2):293-336
[6] Zhang Dong-wen,Xu Hua,Su Zeng-cai,et al.Chinese comments sentiment classification based on word2vec and SVM perf[J].Expert Systems with Applications,2015,42(4):1857-1863
[7] Chawla N V,Japkowicz N,Kotcz A.Editorial:Special issue on learning from imbalanced data sets[J].SIGKDD Explorations Newsletters,2004,6(1):1-6
[8] Wang Su-ge,Li De-yu,Zhao Li-dong,et al.Sample cutting method for imbalanced text sentiment classification based on BRC[J].Knowledge-Based Systems,2013,37:451-461
[9] Su Jin-shu,Zhang Bo-feng,Xu Xin.Advances in machine lear-ning based text categorization[J].Journal of Software,2006,17(9):1848-1859(in Chinese) 苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展 [J].软件学报,2006,17(9):1848-1859
[10] Japkowicz N,Stephen S.The Class Imbalance Problem:A Systematic Study[J].Intelligent Data Analysis,2002,6(5):429-449
[11] Chandrashekar G,Sahin F.A survey on feature selection methods[J].Computers & Electrical Engineering,2014,40(1):16-28
[12] Kubat M,Matwin S.Addressing the curse of imbalanced trai-ning sets:one-sided selection[C]∥Proceedings of the 14th International Conference on Machine Learning.1997:179-186
[13] Wang B X,Japkowicz N.Imbalanced data set learning with synthetic samples[C]∥Proc.IRIS Machine Learning Workshop.2004:19
[14] Zhu Ming,Tao Xin-min.The SVM classifier for unbalanced data based on combination of RU-Undersample and SMOTE [J].Information Technology,2012,1:39-43
[15] Yan Jun,Liu Ning,Zhang Ben-yun,et al.OCFS:optimal orthogo-nal centroid feature selection for text categorization[C]∥Proceedings of the 28th Annual International ACM SIGIR Confe-rence on Research and Development in Information Retrieval.ACM,2005:122-129
[16] Wang Su-ge,Li De-yu,Song Xiao-lei,et al.A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification[J].Expert Systems with Applications,2011,38(7):8696-8702
[17] Dai Liu-ling,Huang He-yan,Chen Zhao-xiong.A comparativestudy on feature selection in Chinese text categorization [J].Journal of Chinese Information Processing,2004,18(1):26-32(in Chinese) 代六玲,黄河燕,陈肇雄.中文文本分类中特征抽取方法的比较研究[J].中文信息学报,2004,18(1):26-32
[18] Mladenic D,Grobelnik M.Feature selection for unbalanced class distribution and naive bayes[C]∥ICML.1999:258-267
[19] Wasikowski M,Chen Xue-wen.Combating the small sampleclass imbalance problem using feature selection[J].IEEE Tran-sactions on Knowledge and Data Engineering,2010,22(10):1388-1400
[20] Yin Liu-zhi,Ge Yong,Xiao Ke-li,et al.Feature selection forhigh-dimensional imbalanced data[J].Neurocomputing,2013,105:3-11
[21] Ren Yong-gong,Yang Rong-jie,Yin Ming-fei,et al.Information-gain-based text feature selection method[J].Computer Science,2012,39(11):127-130(in Chinese) 任永功,杨荣杰,尹明飞,等.基于信息增益的文本特征选择方法[J].计算机科学,2012,39(11):127-130
[22] Ogura H,Amano H,Kondo M.Comparison of metrics for feature selection in imbalanced text classification[J].Expert Systems with Applications,2011,38(5):4978-4989
[23] Zheng Zhao-hui,Wu Xiao-yun,Srihari R.Feature selection fortext categorization on imbalanced data[J].ACM SIGKDD Explorations Newsletter,2004,6(1):80-89
[24] Fan R E,Chen P H,Lin C J.Working set selection using second order information for training support vector machines[J].The Journal of Machine Learning Research,2005,6:1889-1918
[25] He Hai-bo,Garcia E.Learning from imbalanced data[J].IEEETransactions on Knowledge and Engineering,2009,21(9):1263-1284

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!