Computer Science ›› 2015, Vol. 42 ›› Issue (Z6): 146-150.

Previous Articles     Next Articles

High-dimensional Data Discretization Method Based on Improved LLE

XU Tong-de   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Discretization algorithms for continuous features play a very important role in data mining,machine learning and pattern recognition.Existing methods mainly concentrate on discretizing low-dimensional data.However,there are high-dimensional nonlinear data in the real world.Based on this,this paper presented a high-dimensional data discretization method based on improved locally linear embedding(LLE),namely ILLE-HD3.First,LLE could be improved by considering class information of the data to effectively reduce dimensions of high-dimensional data.This facilitates the discretization method to be implemented in a low-dimensional space.Second,with the dimensionality reduction,we proposed a discretization algorithm for continuous features based on difference-similitude set(DSS).It uses class-feature interdependency to determine the selection of cut points in continuous value domain.Meanwhile,it defines a classification error criterion to control information loss generated by partition of continuous domain.Finally,by using the decision tree classification tools,C4.5 and C5.0,the proposed ILLE-HD3 algorithm achieves a better result on high-dimensional nonlinear data and higher classification accuracy than the existing algorithms.

Key words: High-dimensional data,Locally linear embedding(LLE),Discretization,Class-feature interdependency,Difference-similitude set(DSS)

[1] Wu X D.Top 10 algorithms in data mining [J].Knowledge Information System,2008,14(1):1-37
[2] Vadera S.CSNL:a cost-sensitive non-linear decision tree algo-rithm [J].ACM Transactions on Knowledge Discovery from Data,2010,4(2):1-25
[3] Dougherty J,Kohavi R,Sahami M.Supervised and unsupervised discretization of continuous feature [C]∥ Proceedings of the 12th International Conference of Machine Learning.San Francisco:Morgan Kaufmann,1995:194-202
[4] Su C T,Hsu J H.An extended Chi2 algorithm for discretization of real value attributes [J].IEEE Transactions on Knowledge and Data Engineering,2005,17(3):437-441
[5] Fayyad U,Irani K.Multi-interval discretization of continuous-valued attributes for classification learning [C]∥Proceedings of the 13th International Joint Conference on Artificial Intelligence.San Mateo,CA:Morgan Kaufmann,1993:1022-1027
[6] Cios K J,Kurgan L.CAIM discretization algorithm [J].IEEE Transactions on Knowledge and Data Engineering,2004,16(2):145-153
[7] 杨萍,杨天社,杜小宁,等.一种基于类别属性关联程度最大化离散算法[J].控制与决策,2011,26(4):592-596
[8] 赵静娴,倪春鹏,詹原瑞,等.一种高效的连续属性离散化算法[J].系统工程与电子技术,2009,31(1):195-199
[9] Jin R,Breitbart Y,Muoh C.Data discretization unification [C]∥The Seventh IEEE International Conference on Data Mining(ICDM Best Paper).2007:183-192
[10] 史志才,夏永祥,周金祖.基于粒计算的离散化算法及其应用[J].计算机科学,2013,40(6A):133-135
[11] 汪凌.一种基于改进粒子群的连续属性离散化算法[J].计算机工程与应用,2013,49(21):29-32
[12] 徐菲菲,魏莱,杜海洲,等.一种基于互信息的模糊粗糙分类特征基因快速选取方法[J].计算机科学,2013,40(7):216-221
[13] Ruiz F J,Anguio C,Agell N.IDD:a supervised interval distance-based method for discretization [J].IEEE Transactions on Knowledge and Data Engineering,2008,20(9):1230-1238
[14] Bondu A,Boulle M,Lemaire V,et al.A non-parametric semi-supervised discretization method [C]∥The Eighth IEEE International Conference on Data Mining(ICDM).2008:53-62
[15] Armengol E,Garcia-cerdana A.Refining discretizations of continuous-valued attributes [C]∥Modeling of Decisions of Artificial Intelligence Conference,LNAI.Springer,Heidelberg,2012:258-269
[16] Salvador G,Julian L,Antonio S J,et al.A survey of discretization techniques:taxonomy and empirical analysis in supervised learning [J].IEEE Transactions on Knowledge and Data Engineering,2013,25(4):734-750
[17] Roweis S,Saul L.Nonlinear dimensionality reduction by locally linear embedding [J].Science,2000,290(5500):2323-2326
[18] Levina E,Bickel P J.Maximum likelihood estimation of intrinsic dimension [C]∥Advances in Neural Information Processing Systems.2005
[19] Wu M,Xia D L,Yan P L.A new knowledge reduction method based on difference-similitude set theory [C]∥ Proceedings of the Third International Conference on Machine Learning and Cybernetics.2004:1413-1418
[20] Wu M,Xia D L,Yan P L.Discretization algorithm based ondifference-similitude set theory [C]∥Proceedings of the Fourth International Conference on Machine Learning and Cybernetics.2005:1752-1755
[21] Blake C L,Merz C J.UCI repository of machine learning databases .http//:www.ics.uci.edu/~mlearn/MLRepository.html

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!