Computer Science ›› 2015, Vol. 42 ›› Issue (12): 268-271.

Previous Articles     Next Articles

Improvement of C4.5 Algorithm with Free Noise Capacity

WANG Wei, LI Lei and ZHANG Zhi-hong   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Against the decline of decision tree prediction accuracy rate for high-dimensional data with noise,this paper used the theory of noise-free principal component analysis(NFPCA) algorithm to improve C4.5 algorithm,forming the NFPCA-in-C4.5 algorithms.On one hand,the algorithm transforms the noise suppression problem into an optimization problem of a combination of fitting the data and controling the smoothness,getting the space of principal components.On the other hand,it lets the space of principal components back to the space of original data during the process of building a new node in the decision tree from the top to down,to avoid the loss of characteristic information permanently in the dimension reduction process.Experimental results show that NFPCA-in-C4.5 algorithm has effects of dimensionality reduction and noise reduction,and avoids significant reduction of prediction accuracy rate,which is caused by the loss of information and noise.

Key words: High-dimensional data with noise,Noise tolerance,Principal component analysis,C4.5 algorithm

[1] 杨风召.高维数据挖掘中若干关键问题的研究[D].上海:复旦大学,2003 Yang Feng-zhao.The Research on A Few Key Issues in High Dimensional Data Mining[D].Shanghai:Fudan University,2003
[2] 承文俊,沈建强,谢琪,等.容噪学习机制及其在Robocup中的应用研究[J].计算机科学,2004,32(4):101-103 Cheng Wen-jun,Shen Jian-qiang,Xie Qi,et al.Research on Noise Tolerance Mechanism in Robocup[J].Computer Science,2004,32(4):101-103
[3] 倪春鹏.决策树在数据挖掘中若干问题的研究[D].天津:天津大学,2004 Ni Chun-peng.Research on Some Problems of Decision Tree in Data Mining[D].Tianjin:Tianjin University,2004
[4] Mantas C J,Abellán J.Credal-C4.5:Decision tree based on imprecise probabilities to classify noisy data[J].Expert Systems with Applications,2014,1(10):4625-4637
[5] 陈家俊,苏守宝,徐华丽.基于多尺度粗糙集模型的决策树优化算法[J].计算机应用,2011,2:3243-3246 Chen Jia-jun,Su Shou-bao,Xu Hua-li.Decision tree optimization algorithm based on multiscale rough set model[J].Computer Applications,2011,2:3243-3246
[6] Breiman L,Friedman J,Stone C J,et al.Classification and regression trees[M].CRC press,1984
[7] 孟凡荣,蒋晓云,田恬,等.基于主成分分析的决策树构造方法[J].小型微型计算机系统,2008(7):1245-1249 Meng Fan-rong,Jiang Xiao-yun,Tian Tian,et al.Decision Tree Construction Method Based on Principal Component Analysis[J].Journal of Chinese Computer Systems,2008(7):1245-1249
[8] Jolliffe I.Principal component analysis[M].Wiley Online Li-brary,2005
[9] Rezghi M,Obulkasim A.Noise-free principal component analysis:An efficient dimension reduction technique for high dimensional molecular data[J].Expert Systems with Applications,2014,1(17):7797-7804
[10] Hotelling H.Analysis of a complex of statistical variables into principal components[J].Journal of Educational Psychology,1933,4(6):417
[11] 周斯斯.谱聚类维数约简算法研究与应用[D].西安:西安电子科技大学,2010 Zhou Si-si.Spectral Clustering Based Dimensionality Reduction and Applications[D].Xi’an:Xi’an University of Electronic Scie-nce and Technology,2010
[12] Golub G.Matrix computations[M].Johns Hopkins University Press,1996
[13] Hanke M,Hansen P C.Regularization methods for large-scale problems[J].Surv.Math.Ind,1993,3(4):253-315
[14] 树方,平文.数值线性代数[M].北京:北京大学出版社,2000 Shu Fang,Ping Wen.Numerical Linear Algebra[M].Beijing:Beijing University Press,2000
[15] Bjrck A.Numerical methods for least squares problems[M].Siam,1996
[16] Kohavi R.A study of cross-validation and bootstrap for accuracy estimation and model selection[C]∥ International Joint Conference on Artifical Intelligence.1995:1137-1143
[17] 王越,万洪.一种新的应用变精度粗糙集的决策树构造方法[J].重庆理工大学学报(自然科学版),2013,27(11):58-64 Wang Yue,Wan Hong.A New Method for Constructing Decision Tree Based on Variable Precision Rough Set[J].Journal of Chongqing University of Technology (Natural Science),2013,8(11):58-64

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!