Computer Science ›› 2018, Vol. 45 ›› Issue (9): 65-69.doi: 10.11896/j.issn.1002-137X.2018.09.009

• NASAC 2017 • Previous Articles     Next Articles

Naive Bayesian Decision TreeAlgorithm Combining SMOTE and Filter-Wrapper and It’s Application

XU Zhao-zhao1, LI Ching-hwa1, CHEN Tong-lin1, LEE Shin-jye1,2   

  1. School of Software,Yunnan University,Kunming 650091,China1
    Key Laboratory in Software Engineering of Yunan Province,Kunming 650091,China2
  • Received:2017-10-09 Online:2018-09-20 Published:2018-10-10

Abstract: How to efficiently and accurately dig out the medical data generated by the Internet-based wisdom medical system with “Industrial 4.0” is still a very serious problem.However,the medical data is often high-dimensional,unba-lanced and noisy,so this paper proposed a new data processing method combining SMOTE method with Filter-Wrapper feature selection algorithm to support clinical decision-making.In particular,the proposed method not only overcomes the situation of bad prediction result of the independent assumptions in the practical attribute application of Naive Bayesian,but also avoids over-fitting problem caused by constructing the model of C4.5 decision tree.What’s more,when the proposed algorithm is applied to ECG clinical decision-making,good results can be obtained.

Key words: Data balance, Decision tree, Naive Bayesian, Wrapper feature selection

CLC Number: 

  • TP391
[1] CHENG Y Y,QU H B,ZHANG B L.Chinese medicine industry 4.0:advancing digital pharmaceutical manufacture toward intelligent pharmaceutical manufacture[J].China Journal of Chinese Materia Medica,2016,41(1):1.
[2]LI X,LI D,WAN J,et al.A review of industrial wireless networks in the context of Industry 4.0[J].Wireless Networks,2017,23(1):23-41.
[3]WILK S,SLOWINSKI R,MICHALOWSKI W,et al.Supporting triage of children with abdominal Pain in the emergency room[J].European Journal of Operationl Research,2005,160(3):696-709.
[4]CHEN J M,SUN Y X.Experiments study on a dynamic priority scheduling for wireless sensor networks[C]∥Proceedings of Mobile Ad-hoc and Sensor Networks.Wuhan,2005:613-622.
[5]QUINLAN J R.Induction of decision tree[J].Machine Lear-ning,1986,1(1):81-106.
[6]QUINLAN J R.Learning Efficient Classification Procedures and Their Application to Chess End Games[M]∥Machine Lear-ning.Springer Berlin Heidelberg,1984.
[7]MICHALSKI R S,CARBONELL J G,MITCHELL T M.Machine learning: an artificial intelligence approach[M].London:Morgan Kaufmann,1984:463-482.
[8]PALACIOS-ALONSO M A,BRIZUELA C A,SUCAR L E.Evo-lutionary learning of dynamic Nave Bayesian classifiers[J].Journal of Automated Reasoning,2010,45(1):21-37.
[9]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2011,16(1):321-357.
[10]HAN H,WANG W Y,MAO B H.Borderline-SMOTE:A new over-sampling method in imbalanced data sets learning[C]∥Proceedings of the 2005 International Conference on Intelligent Computing.Berlin:Springer Press,2005:878-887.
[11]YEN S J,LEE Y S.Cluster-based under-sampling approaches for imbalanced data distributions[J].Expert Systems with Applications,2009,36(3):5718-5727.
[12]BATISTA G,PRATI R C,MONARD M C.A study of the behaviour of several methods for balancing machine learningtrai-ning data[J].SIGKDD Explor,2004,6(1):20-29.
[13]边肇祺,张学工.模式识别(第2版)[M].北京:清华大学出版社,2000.
[14]LANGLEY P.Selection of relevant features in machine learning[C]∥Proceedings of the AAAI Fall Symposium on Relevance.New Orleans,1994:1-5.
[15]ZHOU X B,WANG X D,DOUGHERTY E R.Nonlinear-Probit Gene Classification Using Mutual Information and WaveIet-Based Feature Selection[J].Biological Systems,2004,12(3):371-386.
[16]SINDHWANI V,RAKSHIT S,DEODHARE D,et al.Feature Selection In MLPs and SVMs Based on Maximum Output Information[J].IEEE Transactions on Neural Networks,2004,15(4):937-948.
[17]HSU W H.Genetic wrappers for feature selection in decision
tree induction and variable ordering in Bayesian network structure learning [J].Information Sciences,2004,163(17):103-122.
[18]LI L,WEINBERG C R,DARDEN T A,et al.Gene Selection for Sample Classification Based on Gene Expression Data:Study of Sensitivity to Choice of Parameters of the GA/KNN Method[J].Bioinformatics,2001,17(12):1131-1142.
[19]INZA l,LARRANAGA P,BLANCO E R,et al.Filter Versus Wrapper Gene Selection Approaches in DNA Microarray Domains[J].Artificial Intelligence in Medicine,2004,31(2):91-103.
[20]ZHANG Y Y,XIANG Y,JIANG R Q,et al.Analysis and Implementation of Map Reduce Parallelization of Naive Bayes Algorithm[J].Computer Technology and Development,2013,23(3):23-26.(in Chinese)
张依杨,向阳,蒋锐权,等.朴素贝叶斯算法的 MapReduce 并行化分析与实现[J].计算机技术与发展,2013,23(3):23-26.
[21]DOMINGOS P,PAZZANI M J.On The Optimality of The Simple Bayesian Classifier under Zero-One Loss[J].Machine Learning,1997,29(2/3):103-130.
[22]QUINLAN J R.Induction of decision trees[J].Machine Lear-ning,1986,1(1):81-106.
[23]SEGAL I E A. note on the concept of entropy[J].Journal of Mathematics and Mechanics,1960,9(4):623-629.
[24]QUINLAN J R.C4.5:Programming for machine learning[M].London,Morgan Kauffmann,1993.
[25]BREIMAN L,FRIEDMAN J H,STONE C J,et al.Classification and regression trees[M].Chapman and Hall,1984.
[26]FAN J C,ZHANG W Y,LIANG Y Q.Decision tree classification algorithm based on Bayesian method[J].Journal of Computer Applications,2005,25(12):2882-2884.(in Chinese)
樊建聪,张问银,梁永全.基于贝叶斯方法的决策树分类算法[J].计算机应用,2005,25(12):2882-2884.
[27]FRANK A,ASUNCION A.UCI Machine Learning Repository[DB/OL].http://archive.ics.uci.edu/ml/Irvine,CA:University of California,School of Information and Computer Science.
[28]YANG L Y,ZHANG J Y,WANG W J.Selecting and Combining Classifiers Simultaneously with Particle Swarm Optimization[J].Information Technology Journal,2009,8(2):241-245.
[29]SINGH R G,PANDEY A.The Impact of Randomization on Circular-Complex Extreme Learning Machine for Real Valued Classification Problems[J].International Journal of Computer Applications,2014,103(2):1-7.
[30]LIPITAKIS A D,ANTZOULATOS G S,KOTSIANTIS S,et
al.Integrating global and local boosting[C]∥2015 6th International Conference on Information,Intelligence,Systems and Applications(IISA).IEEE,2015:1-6.
[31]RAHMAN A,VERMA B.A novel ensemble classifier approach using weak classifier learning on overlapping clusters[C]∥International Joint Conference on Neural Networks.IEEE,2010:1-7.
[32]COELHO A L V,NASCIMENTO D S C.On the evolutionary design of heterogeneous bagging models [J].Neuro Computing,2010,73(16):3319-3322.
[33]CHEN J,JI S,CERAN B,et al.Learning subspace kernels for classification[C]∥Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mi-ning.ACM,2008:106-114.
[34]DO T N,POULET F.Enhancing svm with visualization[C]∥
International Conference on Discovery Science.Springer Berlin Heidelberg,2004:183-194.
[35]QUINLAN J R.Bagging,boosting,and C4.5[C]∥Association for the Advancement of Artificial Intelligence.1996:725-730.
[36]CLARK P,BOSWELL R.Rule induction with CN2:Some recent improvements[C]∥European Working Session on Learning.Springer Berlin Heidelberg,1991:151-163.
[37]JO H,NA Y,OH B,et al.Attribute value taxonomy generation through matrix based adaptive genetic algorithm[C]∥20th IEEE International Conference on Tools with Artificial Intelligence.IEEE,2008,1:393-400.
[38]SAEED A A,CAWLEY G C,BAGNALL A.Benchmarking the semi-supervised naïve Bayes classifier[C]∥International Joint Conference on Neural Networks.IEEE,2015:558-561.
[1] REN Shou-peng, LI Jin, WANG Jing-ru, YUE Kun. Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction [J]. Computer Science, 2022, 49(2): 265-271.
[2] LIU Zhen-yu, SONG Xiao-ying. Multivariate Regression Forest for Categorical Attribute Data [J]. Computer Science, 2022, 49(1): 108-114.
[3] CAO Yang-chen, ZHU Guo-sheng, QI Xiao-yun, ZOU Jie. Research on Intrusion Detection Classification Based on Random Forest [J]. Computer Science, 2021, 48(6A): 459-463.
[4] TANG Liang, LI Fei. Research on Forecasting Model of Internet of Vehicles Security Situation Based on Decision Tree [J]. Computer Science, 2021, 48(6A): 514-517.
[5] DING Si-fan, WANG Feng, WEI Wei. Relief Feature Selection Algorithm Based on Label Correlation [J]. Computer Science, 2021, 48(4): 91-96.
[6] DONG Ming-gang, HUANG Yu-yang, JING Chao. K-Nearest Neighbor Classification Training Set Optimization Method Based on Genetic Instance and Feature Selection [J]. Computer Science, 2020, 47(8): 178-184.
[7] ZHU Di-chen, XIA Huan, YANG Xiu-zhang, YU Xiao-min, ZHANG Ya-cheng and WU Shuai. Research on Mobile Game Industry Development in China Based on Text Mining and Decision Tree Analysis [J]. Computer Science, 2020, 47(6A): 530-534.
[8] ZOU Jie, ZHU Guo-sheng, QI Xiao-yun and CAO Yang-chen. HTTPS Encrypted Traffic Classification Method Based on C4.5 Decision Tree [J]. Computer Science, 2020, 47(6A): 381-385.
[9] DONG Ben-qing, LI Feng-kun. Analysis of Emotional Degree of Poetry Reading Based on WDOUDT [J]. Computer Science, 2020, 47(11A): 46-51.
[10] LV Ming-qi, LI Yi-fan, CHEN Tie-ming. Spatial Estimation Method of Air Quality Based on Terrain Factors LV Ming-qi LI Yi-fan CHEN Tie-ming [J]. Computer Science, 2019, 46(1): 265-270.
[11] SHI Zhi-kai,ZHU Guo-sheng,LEI Long-fei,CHEN Sheng,ZHEN Jia,WU Shan-chao,WU Meng-yu. NAT Device Detection Method Based on C5.0 Decision Tree [J]. Computer Science, 2018, 45(6A): 323-327.
[12] DAI Ming-zhu,GAO Song-feng. Research on Data Mining Algorithm Based on Examination Process and Knowledge Structure [J]. Computer Science, 2018, 45(6A): 437-441.
[13] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree [J]. Computer Science, 2018, 45(4): 157-162.
[14] SHI Zhi-kai and ZHU Guo-sheng. WF-C4.5:Handheld Terminal Traffic Identification Method Based on C4.5 Decision Tree in WiFi Environment [J]. Computer Science, 2017, 44(Z6): 270-273.
[15] WANG Rong, LIU Zun-ren and JI Jun. Decision Tree Algorithm Based on Attribute Significance [J]. Computer Science, 2017, 44(Z11): 129-132.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!