Computer Science ›› 2016, Vol. 43 ›› Issue (8): 190-193.doi: 10.11896/j.issn.1002-137X.2016.08.038

Previous Articles     Next Articles

Multiclass Text Classification by Golden Selection and Support Vector Domain Description

WU De, LIU San-yang and LIANG Jin-jin   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Traditional multiclass text classification methods have disadvantages such as large computation and long training time.An algorithm based on golden selection and support vector domain description (SVDD) was proposed for text classification.The proposed method utilizes TF-IDF formula to compute the relative word frequency for each entry,sorts them in descending order and normalizes the text vector.Then golden selection method is introduced for dimension reduction,where the number of redundant sample features is no more than one.Finally,SVDD is applied for classification,which assigns the test text to the class with the smallest value of the relative class distance.Numerical experiments on various datasets demonstrate that,the proposed method has better robustness,higher classification accuracy and less training time,compared with “one-against-one” and “one-against-all” support vector machine.It is more appropriate for huge text multi-classification problems.

Key words: Multiclass text classification,Golden selection,SVDD,Dimension reduction,Huge text

[1] Sebastiani F.Machine learning in automated text categorization [J].ACM Computing Surveys,2002,34(1):1-47
[2] Su Jin-shu,Zhang Bo-feng,Xu Xin.Advances in machine lear-ning based text categorization[J].Journal of Software,2006,17(9):1848-1859(in Chinese) 苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859
[3] Dong Yue-hua,Guo Shi-chuan.Text clustering algorithm with improved weighting factor and feature vector[J].Computer Engineering and Design,2015,35(4):1051-1057(in Chinese) 董跃华,郭士串.结合权重因子与特征向量改进的文本聚类算法[J].计算机工程与设计,2015,35(4):1051-1057
[4] Zhang Pei-yun,Chen Chuan-ming,Huang Bo.Texts similarity algorithm based on subtrees matching[J].Pattern Recognition and Artificial Intelligence,2014,7(3):226-234(in Chinese) 张佩云,陈传明,黄波.基于子树匹配的文本相似度算法[J].模式识别与人工智能,2014,7(3):226-234
[5] Wan C H,Lee L H,Rajkumar R,et al.A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine[J].Expert System with Application,2012,39(15):11880-11888
[6] Arun K M,Gopal M.A comparison study on multiple binary-class SVM methods for unilabel text categorization[J].Pattern Recognition Letters,2010,31(11):1437-1444
[7] Kumar M A,Gopal M.One-against-one fuzzy support vectormachine classifier:An approach to text categorization[J].Expert System with Application,2009,36(6):10030-10034
[8] Lin Xu-dong,Liu Han-xing,Lin Pi-yuan,et al.Chinese question classification using alternating and iterative one-against-one algorithm[J].Journal of Convergence Information Technology,2010,5(3):61-67
[9] Kumar M A,Gopal M.Reduced one-against-all method for mul-ticlass SVM classification[J].Expert System with Application,2011,38(11):14238-14248
[10] Wu De,Liu San-yang.Multiple support vector domain classifier[J].Journal of Xi’an Jiaotong University,2012,46(6):87-91(in Chinese) 吴德,刘三阳.支持向量域多分类器[J].西安交通大学学报,2012,46(6):87-91
[11] Zhang Yu-fang,Wan Bin-hou,Xiong Zhong-yang.Research onfeature dimension reduction in text classification[J].Application Research of Computer,2012,29(7):2541-2543(in Chinese) 张玉芳,万斌候,熊忠阳.文本分类中的特征降维方法研究[J].计算机应用研究,2012,29(7):2541-2543
[12] Xia Shi-xiong,Li You-wen,Zhou Yong.Method based on semi-supervised local linear algorithm for text classification[J].Application Research of Computer,2010,7(1):64-67(in Chinese) 夏士雄,李佑文,周勇.一种半监督局部线性嵌入算法的文本分类方法[J].计算机应用研究,2010,7(1):64-67
[13] Li Jian-lin.A combination of feature extraction in text classification based on PCA[J].Application Research of Computer,2013,0(8):2398-2401(in Chinese) 李建林.一种基于PCA的组合特征提取文本分类方法[J].计算机应用研究,2013,0(8):2398-2401
[14] Duan Jie,Hu Qing-hua,Zhang Ling-jun,et al.Feature selection for multi-label classification based on neighborhood rough sets[J].Journal of Coumputer Research and Development,2015,2(1):56-65(in Chinese) 段洁,胡清华,张灵均,等.基于邻域粗糙集的多标记分类特征选择算法[J].计算机研究与发展,2015,2(1):56-65
[15] Song Ju-long,Qian Fu-cai.The global optimization methodbased on golden-section[J].Computer Engineering and Applications,2005,8(4):95-96(in Chinese) 宋巨龙,钱富才.基于黄金分割的全局最优化方法[J].计算机工程与应用,2005,8(4):95-96
[16] Yang Wen-chen,Zhang Lun,Rao Qian,et al.Multi-objective optimization for traffic signals with golden Ration based genetic algorithm[J].Journal of Transportation Systems Engineering and Information Technology,2013,3(5):48-55(in Chinese) 杨文臣,张轮,饶倩,等.基于黄金分割点遗传算法的交通信号多目标优化[J].交通运输系统工程与信息,2013,3(5):48-55
[17] Zhong Hua,Wang Yong,Shao Chang-xing.Golden-section adaptive control based on disturbances and model error compensations[J].Application Research of Computer,2015,2(8):2343-2346(in Chinese) 钟华,王永,邵长星.基于扰动和模型误差补偿的黄金分割自适应控制[J].计算机应用研究,2015,2(8):2343-2346
[18] Zhang Li-na,Zhou Run-jing,Na Ri-su.A method for characte-ristic extraction from large sample databased on the golden section method’s ISODATA Algorithm[J].Journal of Inner Mongolia University(Natural Science Edition),2013,4(1):93-96(in Chinese) 张丽娜,周润景,那日苏.基于黄金分割法的ISODATA算法的大样本特征数据提取方法[J].内蒙古大学学报(自然科学),2013,4(1):93-96

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!