计算机科学 ›› 2015, Vol. 42 ›› Issue (12): 268-271.

• 人工智能 • 上一篇    下一篇

具有容噪特性的C4.5算法改进

王伟,李磊,张志鸿   

  1. 郑州大学信息工程学院 郑州450001,郑州大学信息工程学院 郑州450001,郑州大学信息工程学院 郑州450001
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受河南省烟草专卖局科学研究与技术开发项目(HYKJM201335)资助

Improvement of C4.5 Algorithm with Free Noise Capacity

WANG Wei, LI Lei and ZHANG Zhi-hong   

  • Online:2018-11-14 Published:2018-11-14

摘要: 针对有噪声的高维数据引起决策树预测准确率下降的问题,利用容噪主成分分析(Noise-free Principal Component Anlysis,NFPCA)算法思想对C4.5算法改进而形成NFPCA-in-C4.5算法。该算法一方面将高维数据噪声控制问题转化为拟合数据特征与控制平滑度相结合的最优化问题,从而获得主成分空间;另一方面在决策树自顶向下构建新节点的过程中,再将主成分空间恢复到原始数据空间来避免降维过程中属性特征信息永久消失。实验结果表明NFPCA-in-C4.5算法兼具降维和容噪功能,避免了降维中由特征信息损失和噪声残留造成的预测模型准确率大幅降低的问题。

关键词: 高维数据噪声,容噪,主成分分析,C4.5算法

Abstract: Against the decline of decision tree prediction accuracy rate for high-dimensional data with noise,this paper used the theory of noise-free principal component analysis(NFPCA) algorithm to improve C4.5 algorithm,forming the NFPCA-in-C4.5 algorithms.On one hand,the algorithm transforms the noise suppression problem into an optimization problem of a combination of fitting the data and controling the smoothness,getting the space of principal components.On the other hand,it lets the space of principal components back to the space of original data during the process of building a new node in the decision tree from the top to down,to avoid the loss of characteristic information permanently in the dimension reduction process.Experimental results show that NFPCA-in-C4.5 algorithm has effects of dimensionality reduction and noise reduction,and avoids significant reduction of prediction accuracy rate,which is caused by the loss of information and noise.

Key words: High-dimensional data with noise,Noise tolerance,Principal component analysis,C4.5 algorithm

[1] 杨风召.高维数据挖掘中若干关键问题的研究[D].上海:复旦大学,2003 Yang Feng-zhao.The Research on A Few Key Issues in High Dimensional Data Mining[D].Shanghai:Fudan University,2003
[2] 承文俊,沈建强,谢琪,等.容噪学习机制及其在Robocup中的应用研究[J].计算机科学,2004,32(4):101-103 Cheng Wen-jun,Shen Jian-qiang,Xie Qi,et al.Research on Noise Tolerance Mechanism in Robocup[J].Computer Science,2004,32(4):101-103
[3] 倪春鹏.决策树在数据挖掘中若干问题的研究[D].天津:天津大学,2004 Ni Chun-peng.Research on Some Problems of Decision Tree in Data Mining[D].Tianjin:Tianjin University,2004
[4] Mantas C J,Abellán J.Credal-C4.5:Decision tree based on imprecise probabilities to classify noisy data[J].Expert Systems with Applications,2014,1(10):4625-4637
[5] 陈家俊,苏守宝,徐华丽.基于多尺度粗糙集模型的决策树优化算法[J].计算机应用,2011,2:3243-3246 Chen Jia-jun,Su Shou-bao,Xu Hua-li.Decision tree optimization algorithm based on multiscale rough set model[J].Computer Applications,2011,2:3243-3246
[6] Breiman L,Friedman J,Stone C J,et al.Classification and regression trees[M].CRC press,1984
[7] 孟凡荣,蒋晓云,田恬,等.基于主成分分析的决策树构造方法[J].小型微型计算机系统,2008(7):1245-1249 Meng Fan-rong,Jiang Xiao-yun,Tian Tian,et al.Decision Tree Construction Method Based on Principal Component Analysis[J].Journal of Chinese Computer Systems,2008(7):1245-1249
[8] Jolliffe I.Principal component analysis[M].Wiley Online Li-brary,2005
[9] Rezghi M,Obulkasim A.Noise-free principal component analysis:An efficient dimension reduction technique for high dimensional molecular data[J].Expert Systems with Applications,2014,1(17):7797-7804
[10] Hotelling H.Analysis of a complex of statistical variables into principal components[J].Journal of Educational Psychology,1933,4(6):417
[11] 周斯斯.谱聚类维数约简算法研究与应用[D].西安:西安电子科技大学,2010 Zhou Si-si.Spectral Clustering Based Dimensionality Reduction and Applications[D].Xi’an:Xi’an University of Electronic Scie-nce and Technology,2010
[12] Golub G.Matrix computations[M].Johns Hopkins University Press,1996
[13] Hanke M,Hansen P C.Regularization methods for large-scale problems[J].Surv.Math.Ind,1993,3(4):253-315
[14] 树方,平文.数值线性代数[M].北京:北京大学出版社,2000 Shu Fang,Ping Wen.Numerical Linear Algebra[M].Beijing:Beijing University Press,2000
[15] Bjrck A.Numerical methods for least squares problems[M].Siam,1996
[16] Kohavi R.A study of cross-validation and bootstrap for accuracy estimation and model selection[C]∥ International Joint Conference on Artifical Intelligence.1995:1137-1143
[17] 王越,万洪.一种新的应用变精度粗糙集的决策树构造方法[J].重庆理工大学学报(自然科学版),2013,27(11):58-64 Wang Yue,Wan Hong.A New Method for Constructing Decision Tree Based on Variable Precision Rough Set[J].Journal of Chongqing University of Technology (Natural Science),2013,8(11):58-64

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!