Computer Science ›› 2016, Vol. 43 ›› Issue (Z6): 374-379.doi: 10.11896/j.issn.1002-137X.2016.6A.089

Previous Articles     Next Articles

Decision Tree Algorithms for Big Data Analysis

ZHANG Yan and CAO Jian   

  • Online:2018-11-14 Published:2018-11-14

Abstract: As a predictive model in machine learning,decision tree algorithm is widely used in various fields and has been a hot research topic due to its easily understandable result.Since the speed of data generation increases explosively,conventional decision tree algorithms cannot deal well with large datasets due to the limitation of memory capacity and processor speed,and thus novel implementation is needed urgently.Based on classic implementation and optimization method of decision tree,and challenges brought by big data,we compared various kinds of corresponding algorithms.Platforms for big data algorithms were then introduced and possible directions in the future were discussed in the end.

Key words: Decision tree,Big data,Machine learning

[1] Quinlan J R.Induction of decision trees[J].Machine Learning,1986,1(1):81-106
[2] Seni G,Elder J F.Ensemble methods in data mining:improving accuracy through combining predictions[J].Synthesis Lectures on Data Mining and Knowledge Discovery,2010,2(1):1-126
[3] Breiman L,Friedman J,Stone C J,et al.Classification and regression trees[M].CRC Press,1984
[4] Shannon C E.A note on the concept of entropy[J].Bell System Tech.J,1948,27:379-423
[5] Quinlan J R.C4.5:Programming for machine learning[M].Morgan Kauffmann,1993
[6] Ferri C,Flach P,Hernández-Orallo J.Learning decision trees using the area under the ROC curve[C]∥ICML.2002:139-146
[7] Ling C X,Yang Q,Wang J,et al.Decision trees with minimal costs[C]∥Proceedings of the Twenty-first International Conference on Machine Learning.ACM,2004:4-8
[8] Lomax S,Vadera S.A survey of cost-sensitive decision tree induction algorithms[J].ACM Computing Surveys (CSUR),2013,45(2):227-268
[9] Moret S L,Langford W T,Margineantu D D.Learning to predict channel stability using biogeomorphic features[J].Ecological Modelling,2006,191(1):47-57
[10] Kretowski M,Grzes M.Evolutionary induction of mixed decision trees[J].International Journal of Data Warehousing and Mi-ning,2007,3(4):68-82
[11] Fan W,Stolfo S J,Zhang J,et al.AdaCost:misclassification cost-sensitive boosting[C]∥ICML.1999:97-105
[12] Murthy S K,Kasif S,Salzberg S.A system for induction of oblique decision trees[J].Journal of Artificial Intelligence Research,1994,2(1):1-32
[13] Barros R C,Cerri R,Jaskowiak P,et al.A bottom-up oblique decision tree induction algorithm[C]∥2011 11th International Conference on Intelligent Systems Design and Applications (ISDA).IEEE,2011:450-456
[14] Wickramarachchi D C,Robertson B L,Reale M,et al.HH-CART:An Oblique Decision Tree[J].arXiv preprint arXiv:1504.03415,2015
[15] Mingers J.An empirical comparison of pruning methods for decision tree induction[J].Machine learning,1989,4(2):227-243
[16] Mingers J.An empirical comparison of selection measures fordecision-tree induction[J].Machine Learning,1989,3(4):319-342
[17] Elouedi Z,Mellouli K,Smets P.A pre-pruning method in belief decision trees[C]∥The Ninth International Conference on Information Processing and Management of Uncertainty in Know-ledge-Based Systems IPMU.2002,1:579-586
[18] Trabelsi S,Elouedi Z,Mellouli K.Pruning belief decision tree methods in averaging and conjunctive approaches[J].International Journal of Approximate Reasoning,2007,46(3):568-595
[19] Quinlan J R.Simplifying decision trees[J].International Journal of Man-machine sSudies,1987,27(3):221-234
[20] Quinlan J R,Cameron-Jones R M.FOIL:A midterm report[C]∥Machine Learning:ECML-93.Springer Berlin Heidelberg,1993:1-20
[21] Breslow L A,Aha D W.Simplifying decision trees:A survey[J].The Knowledge Engineering Review,1997,12(1):1-40
[22] Cestnik B,Bratko I.On estimating probabilities in tree pruning[C]∥Machine Learning-EWSL-91.Springer Berlin Heidelberg,1991:138-150
[23] Breiman L.Bagging predictors[J].Machine Learning,1996,24(2):123-140
[24] Domingos P.Metacost:A general method for making classifiers cost-sensitive[C]∥Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mi-ning.ACM,1999:155-164
[25] Lin F Y,McClean S.The Prediction of financial distress using a cost sensitive approach and prior probabilities[C]∥Proceedings of the 17th International Conference on Machine Learning (ICML’00).2000
[26] Breiman L.Random forests[J].Machine Learning,2001,45(1):5-32
[27] Freund Y,Schapire R E.A decision-theoretic generalization of on-line learning and an application to boosting[J].Journal of Computer and System Sciences,1997,55(1):119-139
[28] Friedman J H.Greedy function approximation:a gradient boosting machine[J].Annals of Statistics,2001,29(5):1189-1232
[29] Freund Y,Mason L.The alternating decision tree learning algorithm[C]∥ICML.1999,99:124-133
[30] Gligorov V V,Williams M.Efficient,reliable and fast high-level triggering using a bonsai boosted decision tree[J].Journal of Instrumentation,2013,8(2):6
[31] Ayaru L,Ypsilantis P P,Nanapragasam A,et al.Prediction of Outcome in Acute Lower Gastrointestinal Bleeding Using Gradie-nt Boosting[J].PloS one,2015,10(7):e0132485
[32] Lombardo L,Cama M,Conoscenti C,et al.Binary logistic regression versus stochastic gradient boosted decision trees in assessing landslide susceptibility for multiple-occurring landslide events:application to the 2009 storm event in Messina (Sicily,southern Italy)[J].Natural Hazards,2015,79(3):1-28
[33] He X,Pan J,Jin O,et al.Practical lessons from predicting clicks on ads at facebook[C]∥Proceedings of 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.ACM,2014:1-9
[34] Haralick R M.The table look-up rule[J].Communications in Statistics-Theory and Methods,1976,5(12):1163-1191
[35] Hartmann C R P,Varshney P K,Mehrotra K G,et al.Application of information theory to the construction of efficient decision trees[J].IEEE Transactions on Information Theory,1982,28(4):565-577
[36] Knuth D E.Optimum binary search trees[J].Acta Informatica,1971,1(1):14-25
[37] Zhang S,Qin Z,Ling C X,et al.Missing is useful:missing values in cost-sensitive decision trees[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(12):1689-1693
[38] Gu Q,Zhang Y,Cao J,et al.A confidence-based entity resolution approach with incomplete information[C]∥2014 International Conference on Data Science and Advanced Analytics (DSAA).IEEE,2014:97-103
[39] Utgoff P E,Berkman N C,Clouse J A.Decision tree induction based on efficient tree restructuring[J].Machine Learning,1997,29(1):5-44
[40] Gama J,Fernandes R,Rocha R.Decision trees for mining data streams[J].Intelligent Data Analysis,2006,10(1):23-45
[41] Buntine W,Niblett T.A further comparison of splitting rules for decision-tree induction[J].Machine Learning,1992,8(1):75-85
[42] Dietterich T G.An experimental comparison of three methods for constructing ensembles of decision trees:Bagging,boosting,and randomization[J].Machine Learning,2000,40(2):139-157
[43] Dietterich T G.Ensemble methods in machine learning[M]∥Multiple Classifier Systems.Springer Berlin Heidelberg,2000:1-15
[44] Opitz D,Maclin R.Popular ensemble methods:An empiricalstudy[J].Journal of Artificial Intelligence Research,1999:169-198
[45] Panda B,Herbach J S,Basu S,et al.Planet:massively parallellearning of tree ensembles with mapreduce[J].Proceedings of the VLDB Endowment,2009,2(2):1426-1437
[46] Yin W,Simmhan Y,Prasanna V K.Scalable regression treelearning on Hadoop using OpenPlanet[C]∥Proceedings of third international workshop on MapReduce and its Applications Date.ACM,2012:57-64
[47] Ye J,Chow J H,Chen J,et al.Stochastic gradient boosted distributed decision trees[C]∥Proceedings of the 18th ACM Conference on Information and Knowledge Management.ACM,2009:2061-2064
[48] Gropp W,Lusk E,Doss N,et al.A high-performance,portable implementation of the MPI message passing interface standard[J].Parallel Computing,1996,22(6):789-828
[49] Dean J,Ghemawat S.MapReduce:simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113
[50] Mehta M,Agrawal R,Rissanen J.SLIQ:A fast scalable classi-fier for data mining[M]∥Advances in Database Technology(EDBT’96).Springer Berlin Heidelberg,1996:18-32
[51] Bradford J P,Fortes J A B.Characterization and parallelization of decision-tree induction[J].Journal of Parallel and Distributed Computing,2001,61(3):322-349
[52] Ranka S,Singh V.CLOUDS:A decision tree classifier for large datasets[J].Knowledge Discovery and Data Mining,1998:2-8
[53] Shafer J,Agrawal R,Mehta M.SPRINT:A scalable parallelclassifier for data mining[C]∥Proc.1996 Int.Conf.Very Large Data Bases.1996:544-555
[54] Gehrke J,Ramakrishnan R,Ganti V.RainForest-A framework for fast decision tree construction of large datasets[C]∥VLDB.1998,98:416-427
[55] Gehrke J,Ganti V,Ramakrishnan R,et al.BOAT—optimistic decision tree construction[C]∥ACM SIGMOD Record.ACM,1999,28(2):169-180
[56] Domingos P,Hulten G.Mining high-speed data streams[C]∥Proceedings of the Sixth ACM SIGKDD International Confe-rence on Knowledge Discovery and Data Mining.ACM,2000:71-80
[57] Hulten G,Spencer L,Domingos P.Mining time-changing datastreams[C]∥Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2001:97-106
[58] Gama J,Rocha R,Medas P.Accurate decision trees for mining high-speed data streams[C]∥Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2003:523-528
[59] Bifet A,Holmes G,Kirkby R,et al.Moa:Massive online analysis[J].The Journal of Machine Learning Research,2010,11:1601-1604
[60] Neumeyer L,Robbins B,Nair A,et al.S4:Distributed stream computing platform[C]∥2010 IEEE International Conference on Data Mining Workshops (ICDMW).IEEE,2010:170-177
[61] Wojnarski M.Debellor:a data mining platform with stream architecture[M]∥Transactions on Rough Sets IX.Springer Berlin Heidelberg,2008:405-427
[62] Fan C Y,Chang P C,Lin J J,et al.A hybrid model combining case-based reasoning and fuzzy decision tree for medical data classification[J].Applied Soft Computing,2011,11(1):632-644
[63] Wu M C,Lin S Y,Lin C H.An effective application of decision tree to stock trading[J].Expert Systems with Applications,2006,31(2):270-274
[64] Kim J W,Lee B H,Shaw M J,et al.Application of decision-tree induction techniques to personalized advertisements on internet storefronts[J].International Journal of Electronic Commerce,2001,5(3):45-62
[65] Zeitouni K,Chelghoum N.Spatial decision tree-application totraffic risk analysis[C]∥ACS/IEEE International Conference on Computer Systems and Applications.IEEE,2001:203-207

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!