Computer Science ›› 2017, Vol. 44 ›› Issue (7): 167-174.doi: 10.11896/j.issn.1002-137X.2017.07.030

Previous Articles     Next Articles

Partially-lazy Learning Classification Algorithm Based on Representation of Data Stream Model

JIANG Jing-jing, WANG Zhi-hai and YUAN Ji-dong   

  • Online:2018-11-13 Published:2018-11-13

Abstract: Utilizing patterns extracted from large scale data to build classification model is one of important research problems.Exploiting patterns to estimate Bayesian probability is a feasible approach.However,most of the existing pattern-based Bayesian classifiers aim at static data set,which cannot adapt to the dynamic data stream environment.A Bayesian classification model,named PBDS(Pattern-based Bayesian classifier for Data Stream),based on pattern discove-ry over data streams was proposed.PBDS constructs local model for unseen case based on continuously updated frequent item sets with partially-lazy learning method.To accelerate data processing,the simpler data structure,i.e.,hybrid trees structure was proposed,and pattern extracting mechanism was proposed to reduce the generation of candidate itemsets.Smoothing technique was used to handle incomplete itemset extraction in the data stream.Extensive experiments on real-world and synthetic data streams show that PBDS is more accurate than state-of-the-art data stream classifiers.

Key words: Data stream,Frequent pattern,Bayesian,Partially-lazy learning

[1] AGRAWAL R,SWAMI A.Database mining:A performanceperspective [J].IEEE Transactions on Knowledge and Data Engineering,1993,5(6):914-925.
[2] BARALIS E,CAGLIERO L,GARZA P.Enbay:A novel pattern-based Bayesian classifier [J].IEEE Transaction on Know-ledge and Data Engineering,2013,25(12):2780-2795.
[3] BARALIS E,CAGLIERO L.RIB:A robust itemset-based Ba-yesian approach to classification [J].Knowledge-Based Systems,2014,71:366-375.
[4] BIFET A,HOLMES G,KIRKBY R,et al.MOA:Massive online analysis[J].Journal of Machine Learning Research,2010, 99:1601-1604.
[5] BIFET A,READ J,PFAHRINGER B,et al.Efficient datastream classification via probabilistic adaptive windows[C]∥Proceedings of the 28th Annual ACM Symposium on Applied Computing.New York:ACM,2013:801-806.
[6] DOMINGOS P,PAZZANI M.On the optimality of the simple Bayesian classifier under zero-one Loss [J].Machine Learning,1997,29(2):103-130.
[7] FAN H J,RAMAMOHANARAO K.A Bayesian approach to use emerging patterns for classification[C]∥Proceedings of the 14th Australasian Database Conference.Darlinghurst,Australia:Australian Computer Society,2003:39-48.
[8] FAYYAD U M,IRANI K B.Multi-interval discretization ofcontinuous-valued attributes for classification learning[C]∥Proceedings of the 13th Int’l Joint Conf.Artificial Intelligence.1993:1022-1027.
[9] FRIEDMAN N,GOLDSZMIDT M.Building classifiers using Bay-esian networks[C]∥Proceedings of The Thirteenth NationalConference on Artificial Intelligence.1996:1277-1284.
[10] GAMA J,SEBASTIAO R,RODRIGUES P P.Issues in evaluation of stream learning algorithms[C]∥Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2009:329-338.
[11] GAMA J,KOSINA P.Learning decision rules from data streams[C]∥Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence.2010:1255-1260.
[12] HULTEN G,SPENCER L,DOMINGOS P.Mining time-chan-ging data streams[C]∥Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2001:97-106.
[13] IKONOMOVSKA E,GAMA J,DEROSKI S.Learning model trees from evolving data streams[J].Data Mining and Know-ledge Discovery,2011,23(1):128-168.
[14] KEOGH E J,PAZZANI M J.Learning augmented Bayesianclassifiers:A comparison of distribution-based and classification-based approaches[C]∥Proceedings of the 7th Int’l Workshop on Artificial Intelligence and Statistics.San Francisco:Morgan Kaufmann Publishers,1999:225-230.
[15] LEWIS D D.Naive(Bayes) at forty:The independence assumption in information retrieval[J].Machine Learning:ECML-98,1998,1398:4-15.
[16] MCCALLUM A,NIGAM K.A comparison of event models for Naive Bayes text classification[C]∥Proceedings of AAAI-98 Workshop on ‘Learning for Text Categorization’,1998.
[17] LI H F,SHAN M K,LEE S Y.DSM-FI:An efficient algorithm for mining frequent itemsets in data streams[J].Knowledge and Information Systems,2008,17(1):79-97.
[18] MERETAKIS D,WIJTHRICH B.Extending Naive Bayes classifiers using long itemsets[C]∥Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining.New York:ACM,2014:165-174.
[19] PFAHRINGER B,HOLMES G,KIRKBY R.New options forhoeffding trees[C]∥Procee-dings of AI 2007:Advances in Artificial Intelligence.Heidelberg,Berlin:Springer-Verlag,2007:90-99.
[20] STREET W N,KIM Y S.A streaming ensemble algorithm(SEA) for large-scale classification[C]∥Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,New York,2001:377-382.
[21] SCHLIMMER J C,GRANGER R H.Incremental learning from noisy data[J].Machine Learning,1986,1(3):317-354.
[22] ZHANG P,GAO B J,ZHU X Q,et al.Enabling fast lazy lear-ning for data streams[C]∥Proceedings of 2011 IEEE 11th International Conference on Data Mining.New Jersey,USA:IEEE,2011:933-941.
[23] ZLIOBAITE I,BIFET A,PFAHRINGER B,et al.Active lear-ning with evolving streaming data[J].Lecture Note in Computer Science,2011,6913:597-612.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!