Computer Science ›› 2018, Vol. 45 ›› Issue (9): 65-69, 74.doi: 10.11896/j.issn.1002-137X.2018.09.009

• NASAC 2017 • Previous Articles     Next Articles

Naive Bayesian Decision TreeAlgorithm Combining SMOTE and Filter-Wrapper and It’s Application

XU Zhao-zhao1, LI Ching-hwa1, CHEN Tong-lin1, LEE Shin-jye1,2   

  1. School of Software,Yunnan University,Kunming 650091,China1
    Key Laboratory in Software Engineering of Yunan Province,Kunming 650091,China2
  • Received:2017-10-09 Online:2018-09-20 Published:2018-10-10

Abstract: How to efficiently and accurately dig out the medical data generated by the Internet-based wisdom medical system with “Industrial 4.0” is still a very serious problem.However,the medical data is often high-dimensional,unba-lanced and noisy,so this paper proposed a new data processing method combining SMOTE method with Filter-Wrapper feature selection algorithm to support clinical decision-making.In particular,the proposed method not only overcomes the situation of bad prediction result of the independent assumptions in the practical attribute application of Naive Bayesian,but also avoids over-fitting problem caused by constructing the model of C4.5 decision tree.What’s more,when the proposed algorithm is applied to ECG clinical decision-making,good results can be obtained.

Key words: Data balance, Wrapper feature selection, Naive Bayesian, Decision tree

CLC Number: 

  • TP391
[1] CHENG Y Y,QU H B,ZHANG B L.Chinese medicine industry 4.0:advancing digital pharmaceutical manufacture toward intelligent pharmaceutical manufacture[J].China Journal of Chinese Materia Medica,2016,41(1):1.
[2] LI X,LI D,WAN J,et al.A review of industrial wireless networks in the context of Industry 4.0[J].Wireless Networks,2017,23(1):23-41.
[3] WILK S,SLOWINSKI R,MICHALOWSKI W,et al.Supporting triage of children with abdominal Pain in the emergency room[J].European Journal of Operationl Research,2005,160(3):696-709.
[4] CHEN J M,SUN Y X.Experiments study on a dynamic priority scheduling for wireless sensor networks[C]∥Proceedings of Mobile Ad-hoc and Sensor Networks.Wuhan,2005:613-622.
[5] QUINLAN J R.Induction of decision tree[J].Machine Lear-ning,1986,1(1):81-106.
[6] QUINLAN J R.Learning Efficient Classification Procedures and Their Application to Chess End Games[M]∥Machine Lear-ning.Springer Berlin Heidelberg,1984.
[7] MICHALSKI R S,CARBONELL J G,MITCHELL T M.Machine learning: an artificial intelligence approach[M].London:Morgan Kaufmann,1984:463-482.
[8] PALACIOS-ALONSO M A,BRIZUELA C A,SUCAR L E.Evo-lutionary learning of dynamic Nave Bayesian classifiers[J].Journal of Automated Reasoning,2010,45(1):21-37.
[9] CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2011,16(1):321-357.
[10] HAN H,WANG W Y,MAO B H.Borderline-SMOTE:A new over-sampling method in imbalanced data sets learning[C]∥Proceedings of the 2005 International Conference on Intelligent Computing.Berlin:Springer Press,2005:878-887.
[11] YEN S J,LEE Y S.Cluster-based under-sampling approaches for imbalanced data distributions[J].Expert Systems with Applications,2009,36(3):5718-5727.
[12] BATISTA G,PRATI R C,MONARD M C.A study of the behaviour of several methods for balancing machine learningtrai-ning data[J].SIGKDD Explor,2004,6(1):20-29.
[13] 边肇祺,张学工.模式识别(第2版)[M].北京:清华大学出版社,2000.
[14] LANGLEY P.Selection of relevant features in machine learning[C]∥Proceedings of the AAAI Fall Symposium on Relevance.New Orleans,1994:1-5.
[15] ZHOU X B,WANG X D,DOUGHERTY E R.Nonlinear-Probit Gene Classification Using Mutual Information and WaveIet-Based Feature Selection[J].Biological Systems,2004,12(3):371-386.
[16] SINDHWANI V,RAKSHIT S,DEODHARE D,et al.Feature Selection In MLPs and SVMs Based on Maximum Output Information[J].IEEE Transactions on Neural Networks,2004,15(4):937-948.
[17] HSU W H.Genetic wrappers for feature selection in decisiontree induction and variable ordering in Bayesian network structure learning [J].Information Sciences,2004,163(17):103-122.
[18] LI L,WEINBERG C R,DARDEN T A,et al.Gene Selection for Sample Classification Based on Gene Expression Data:Study of Sensitivity to Choice of Parameters of the GA/KNN Method[J].Bioinformatics,2001,17(12):1131-1142.
[19] INZA l,LARRANAGA P,BLANCO E R,et al.Filter Versus Wrapper Gene Selection Approaches in DNA Microarray Domains[J].Artificial Intelligence in Medicine,2004,31(2):91-103.
[20] ZHANG Y Y,XIANG Y,JIANG R Q,et al.Analysis and Implementation of Map Reduce Parallelization of Naive Bayes Algorithm[J].Computer Technology and Development,2013,23(3):23-26.(in Chinese)张依杨,向阳,蒋锐权,等.朴素贝叶斯算法的 MapReduce 并行化分析与实现[J].计算机技术与发展,2013,23(3):23-26.
[21] DOMINGOS P,PAZZANI M J.On The Optimality of The Simple Bayesian Classifier under Zero-One Loss[J].Machine Learning,1997,29(2/3):103-130.
[22] QUINLAN J R.Induction of decision trees[J].Machine Lear-ning,1986,1(1):81-106.
[23] SEGAL I E A. note on the concept of entropy[J].Journal of Mathematics and Mechanics,1960,9(4):623-629.
[24] QUINLAN J R.C4.5:Programming for machine learning[M].London,Morgan Kauffmann,1993.
[25] BREIMAN L,FRIEDMAN J H,STONE C J,et al.Classification and regression trees[M].Chapman and Hall,1984.
[26] FAN J C,ZHANG W Y,LIANG Y Q.Decision tree classification algorithm based on Bayesian method[J].Journal of Computer Applications,2005,25(12):2882-2884.(in Chinese)樊建聪,张问银,梁永全.基于贝叶斯方法的决策树分类算法[J].计算机应用,2005,25(12):2882-2884.
[27] FRANK A,ASUNCION A.UCI Machine Learning Repository[DB/OL].,CA:University of California,School of Information and Computer Science.
[28] YANG L Y,ZHANG J Y,WANG W J.Selecting and Combining Classifiers Simultaneously with Particle Swarm Optimization[J].Information Technology Journal,2009,8(2):241-245.
[29] SINGH R G,PANDEY A.The Impact of Randomization on Circular-Complex Extreme Learning Machine for Real Valued Classification Problems[J].International Journal of Computer Applications,2014,103(2):1-7.
[30] LIPITAKIS A D,ANTZOULATOS G S,KOTSIANTIS S,etal.Integrating global and local boosting[C]∥2015 6th International Conference on Information,Intelligence,Systems and Applications(IISA).IEEE,2015:1-6.
[31] RAHMAN A,VERMA B.A novel ensemble classifier approach using weak classifier learning on overlapping clusters[C]∥International Joint Conference on Neural Networks.IEEE,2010:1-7.
[32] COELHO A L V,NASCIMENTO D S C.On the evolutionary design of heterogeneous bagging models [J].Neuro Computing,2010,73(16):3319-3322.
[33] CHEN J,JI S,CERAN B,et al.Learning subspace kernels for classification[C]∥Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mi-ning.ACM,2008:106-114.
[34] DO T N,POULET F.Enhancing svm with visualization[C]∥International Conference on Discovery Science.Springer Berlin Heidelberg,2004:183-194.
[35] QUINLAN J R.Bagging,boosting,and C4.5[C]∥Association for the Advancement of Artificial Intelligence.1996:725-730.
[36] CLARK P,BOSWELL R.Rule induction with CN2:Some recent improvements[C]∥European Working Session on Learning.Springer Berlin Heidelberg,1991:151-163.
[37] JO H,NA Y,OH B,et al.Attribute value taxonomy generation through matrix based adaptive genetic algorithm[C]∥20th IEEE International Conference on Tools with Artificial Intelligence.IEEE,2008,1:393-400.
[38] SAEED A A,CAWLEY G C,BAGNALL A.Benchmarking the semi-supervised naïve Bayes classifier[C]∥International Joint Conference on Neural Networks.IEEE,2015:558-561.
[1] LV Ming-qi, LI Yi-fan, CHEN Tie-ming. Spatial Estimation Method of Air Quality Based on Terrain Factors LV Ming-qi LI Yi-fan CHEN Tie-ming [J]. Computer Science, 2019, 46(1): 265-270.
[2] DAI Ming-zhu,GAO Song-feng. Research on Data Mining Algorithm Based on Examination Process and Knowledge Structure [J]. Computer Science, 2018, 45(6A): 437-441.
[3] SHI Zhi-kai,ZHU Guo-sheng,LEI Long-fei,CHEN Sheng,ZHEN Jia,WU Shan-chao,WU Meng-yu. NAT Device Detection Method Based on C5.0 Decision Tree [J]. Computer Science, 2018, 45(6A): 323-327.
[4] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree [J]. Computer Science, 2018, 45(4): 157-162.
[5] SHI Zhi-kai and ZHU Guo-sheng. WF-C4.5:Handheld Terminal Traffic Identification Method Based on C4.5 Decision Tree in WiFi Environment [J]. Computer Science, 2017, 44(Z6): 270-273.
[6] WANG Rong, LIU Zun-ren and JI Jun. Decision Tree Algorithm Based on Attribute Significance [J]. Computer Science, 2017, 44(Z11): 129-132.
[7] SHEN Si-qian, MAO Yu-guang and JIANG Guan-ru. Method of Constructing Differential Privacy Decision Tree Classifier with Incomplete Data Sets [J]. Computer Science, 2017, 44(6): 139-143, 149.
[8] ZHANG Yan and CAO Jian. Decision Tree Algorithms for Big Data Analysis [J]. Computer Science, 2016, 43(Z6): 374-379, 383.
[9] YI Jun-kai, LI Zheng-dong and LI Hui. Decision Tree Algorithm in Non-invasive Monitoring Cell Phone Traffic [J]. Computer Science, 2016, 43(Z6): 361-364.
[10] ZHAO Zhi-fan and CAO Qian. Study on Financial Crisis Prediction Model with Data Envelopment Analysis and Data Mining [J]. Computer Science, 2016, 43(Z11): 461-465.
[11] YI Yun-hui, LIU Hai-feng and ZHU Zhen-xian. Research of Passive OS Recognition Based on Decision Tree [J]. Computer Science, 2016, 43(8): 79-83.
[12] CEN Yue-feng, WANG Wan-liang, YAO Xin-wei, WANG Chao-chao and PAN Tie-qiang. Decision Tree Based Coding Unit Splitting Algorithm for HEVC [J]. Computer Science, 2016, 43(4): 308-312.
[13] CHEN Liang, CHEN Xing-yuan, SUN Yi and DU Xue-hui. Detection of Malicious PDF Based on Structural Path [J]. Computer Science, 2015, 42(2): 90-94.
[14] PAN Ming-xing and SUN Han. Fast and Efficient Algorithm for Airborne Target Recognition [J]. Computer Science, 2014, 41(Z6): 150-152.
[15] YAN Ming-jun,XIANG Jun,LUO Yan and HOU Jian-hua. Face Detection Based on SURF and Hough Forests [J]. Computer Science, 2014, 41(7): 301-305.
Full text



[1] . [J]. Computer Science, 2018, 1(1): 1 .
[2] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[3] XIA Qing-xun and ZHUANG Yi. Remote Attestation Mechanism Based on Locality Principle[J]. Computer Science, 2018, 45(4): 148 -151, 162 .
[4] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree[J]. Computer Science, 2018, 45(4): 157 -162 .
[5] WANG Huan, ZHANG Yun-feng and ZHANG Yan. Rapid Decision Method for Repairing Sequence Based on CFDs[J]. Computer Science, 2018, 45(3): 311 -316 .
[6] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[7] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[8] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[9] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics[J]. Computer Science, 2018, 45(4): 169 -172 .
[10] ZHONG Fei and YANG Bin. License Plate Detection Based on Principal Component Analysis Network[J]. Computer Science, 2018, 45(3): 268 -273 .