Computer Science ›› 2014, Vol. 41 ›› Issue (5): 227-229.doi: 10.11896/j.issn.1002-137X.2014.05.047

Previous Articles     Next Articles

Weighted Bayes Based Data Streaming Online Classification Algorithm

LU Hui-lin   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Traditional classification algorithms need to obtain the whole training dataset before training the model.However,for big data,data are streaming into the system sequentially,so it is impossible to obtain the whole training dataset beforehand.This paper studied the online classification problem in data streaming for big data.It first described the online classification problem as an optimization problem,then proposed a Weighted Nave Bayes classifier and an Error Adaptive classifier,and at last,validated the efficiency of the proposed algorithm according to two real datasets.The experiments show that the prediction accuracy of our proposed algorithm is higher than related researches in non-noisy data streaming,and moreover, while data streaming is noisy,our algorithm still has better prediction accuracy,so it can be used in real online classification application in data streaming.

Key words: Big data,Decision tree,Classification algorithm,Data streaming

[1] Domingos P,Hulten G.Mining high-speed data streams[C]∥Proceedings of the Sixth ACM SIGKDD International Confe-rence on Knowledge Discovery And Data Mining.ACM,2000:71-80
[2] Yang H,Fong S.Moderated VFDT in stream mining using adaptive tie threshold and incremental pruning[M]∥Data Warehousing and Knowledge Discovery.Springer,2011:471-483
[3] Hulten G,Spencer L,Domingos P.Mining time-changing datastreams[C]∥Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery And Data Mining.2001:97-106
[4] Li W,Han J,Pei J.CMAR:Accurate and efficient classification based on multiple class-association rules[C]∥IEEE International Conference on Data Mining.ACM,2001:369-376
[5] Han J.CPAR:Classification based on predictive association rules.http:∥sci2s.ugr.es/keel/pdf/algorithm/congreso/2003-Yin-CPAR.pdf,2003
[6] Thabtah F,Cowling P,Peng Y.MCAR:multi-class classification based on association rule[C]∥The 3rd ACS/IEEE International Conference on Computer Systems and Applications.IEEE,2005
[7] 詹英,吴春明,王宝军.一种与缓冲区紧耦合的环形循环滑动窗口的数据流抽取算法[J].电子学报,2011,39(4):2262-2267
[8] 崔贯勋,李梁,王柯柯,等.关联规则挖掘中 Apriori 算法的研究与改进[J].计算机应用,2010,30(11):2952-2955
[9] 詹英,吴春明,王宝军.基于 RCSW 的数据流速度异常检测算法研究[J].电子学报,2012,40(4):674-680
[10] 吴枫,仲妍,吴泉源.基于增量核主成分分析的数据流在线分类框架[J].自动化学报,2010,36(4):534-542
[11] Tang L,Tian L F,Steward B L.Classification of broadleaf and grass weeds using gabor wavelets and an artificial neural network[J].Transactions of the Asae,2003,46(4):1247
[12] Pfahringer B,Holmes G,Kirkby R.New options for hoeffding trees[M]∥AI 2007:Advances in Artificial Intelligence.Springer,2007:90-99
[13] Gama J A O,Rocha R,Medas P.Accurate decision trees formining high-speed data streams[C]∥Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.ACM,2003:523-528
[14] Hashemi S,Yang Y.Flexible decision tree for data stream classification in the presence of concept change,noise and missing values[J].Data Mining and Knowledge Discovery,2009,19:95-131
[15] Bifet A,Holmes G,Kirkby R,et al.Moa:Massive online analysis[J].The Journal of Machine Learning Research,2010,99:1601-1604
[16] Oza N C.Online bagging and boosting[C]∥2005IEEE International Conference on Systems,Man And Cybernetics.IEEE,2005:2340-2345
[17] Bifet A,Holmes G,Pfahringer B,et al.New ensemble methods for evolving data streams[C]∥Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery And Data Mining.2009:139-148
[18] Bifet A,Gavalda R.Learning from time-changing data with adaptive windowing.http://www.lsi.upc.edu/~abifet/TimevaryingE.pdf
[19] 王柯柯,崔贯勋,倪伟,等.基于单元的快速的大数据集离群数据挖掘算法[J].重庆邮电大学学报:自然科学版,2010,2(5):673-677

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!