一种基于信息论的决策表连续属性离散化算法

计算机科学 ›› 2010, Vol. 37 ›› Issue (4): 231-.

一种基于信息论的决策表连续属性离散化算法

岳海亮,闫德勤

(辽宁师范大学计算机与信息技术学院大连116081)

出版日期:2018-12-01 发布日期:2018-12-01
基金资助:
本文受国家自然科学基金(60372071)，中国科学院自动化研究所复杂系统与智能科学重点实验室开放课题基金(20070101)资助。

New Algorithm for Discretization Based on Information Entropy

YUE Hai-liang,YAN De-qin

Online:2018-12-01 Published:2018-12-01

摘要/Abstract

摘要： 连续属性离散化方法对后续阶段的机器学习和数据挖掘过程有着重要的意义。提出一种新的针对决策表的离散化算法，在该算法中，首先将信息嫡用作判断标准，从候选断点集中选择合适的断点，然后删除一些冗余的断点来优化离散结果，在删除过程中为了尽可能保证决策表分类能力不变，使用不一致率对该过程进行控制。最后选取多组实验数据，使用当前流行的分类算法—支持向量机(SVM)对离散化后的数据进行分类预测，并与其它离散算法进行对比，结果表明本算法是有效的。

关键词: 连续属性离散化，决策表，信息嫡，不一致率

Abstract: The discretization of continues attributes is always with great contribution to the followed process of machine learning or data mining. A new algorithm based on information entropy for discretization of decision table was proposed. Through inconsistency checking of decision table, we deleted some redundant cut points on the basis of preliminary discretization scheme. The experiments of classification of discreted data were performed by using SVM, and meanwhile compared with other algorithms, the presented algorithm is effective.

Key words: Discretization,Decision Lable,Information entropy,Inconsistency

岳海亮,闫德勤. 一种基于信息论的决策表连续属性离散化算法[J]. 计算机科学, 2010, 37(4): 231-. https://doi.org/

YUE Hai-liang,YAN De-qin. New Algorithm for Discretization Based on Information Entropy[J]. Computer Science, 2010, 37(4): 231-. https://doi.org/

参考文献

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed