计算机科学 ›› 2017, Vol. 44 ›› Issue (7): 167-174.doi: 10.11896/j.issn.1002-137X.2017.07.030

• 人工智能 • 上一篇    下一篇

一种基于数据流模式表示的半懒惰式分类算法

江晶晶,王志海,原继东   

  1. 北京交通大学计算机与信息技术学院 北京100044交通数据分析与挖掘北京市重点实验室 北京100044,北京交通大学计算机与信息技术学院 北京100044交通数据分析与挖掘北京市重点实验室 北京100044,北京交通大学计算机与信息技术学院 北京100044交通数据分析与挖掘北京市重点实验室 北京100044
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金(61672086),北京市自然科学基金(4142042)资助

Partially-lazy Learning Classification Algorithm Based on Representation of Data Stream Model

JIANG Jing-jing, WANG Zhi-hai and YUAN Ji-dong   

  • Online:2018-11-13 Published:2018-11-13

摘要: 依据从大规模数据中抽取的模式来建立分类模型是模式挖掘的重要研究问题之一。一种可行的方法是根据模式集合建立贝叶斯分类模型。然而,目前基于模式的贝叶斯分类模型大多是针对静态数据集合的,通常不能适应于高速动态变化与无限的数据流环境。对此,提出一种数据流环境下基于模式发现的贝叶斯分类学习模型,其采用半懒惰式学习策略,针对分类实例在不断更新的频繁项集合上建立局部的分类模型;为加快流数据处理的速度,提出了结构更为简单的混合树结构,同时提出了给定项限制的模式抽取机制以减少候选项集的生成;对数据流中模式抽取不完全的情况,使用平滑技术处理未被抽取的项。大量实验分析证明,相较于其他数据流分类器,所提模型具有更高的分类正确率。

关键词: 数据流,频繁模式,贝叶斯,半懒惰式学习

Abstract: Utilizing patterns extracted from large scale data to build classification model is one of important research problems.Exploiting patterns to estimate Bayesian probability is a feasible approach.However,most of the existing pattern-based Bayesian classifiers aim at static data set,which cannot adapt to the dynamic data stream environment.A Bayesian classification model,named PBDS(Pattern-based Bayesian classifier for Data Stream),based on pattern discove-ry over data streams was proposed.PBDS constructs local model for unseen case based on continuously updated frequent item sets with partially-lazy learning method.To accelerate data processing,the simpler data structure,i.e.,hybrid trees structure was proposed,and pattern extracting mechanism was proposed to reduce the generation of candidate itemsets.Smoothing technique was used to handle incomplete itemset extraction in the data stream.Extensive experiments on real-world and synthetic data streams show that PBDS is more accurate than state-of-the-art data stream classifiers.

Key words: Data stream,Frequent pattern,Bayesian,Partially-lazy learning

[1] AGRAWAL R,SWAMI A.Database mining:A performanceperspective [J].IEEE Transactions on Knowledge and Data Engineering,1993,5(6):914-925.
[2] BARALIS E,CAGLIERO L,GARZA P.Enbay:A novel pattern-based Bayesian classifier [J].IEEE Transaction on Know-ledge and Data Engineering,2013,25(12):2780-2795.
[3] BARALIS E,CAGLIERO L.RIB:A robust itemset-based Ba-yesian approach to classification [J].Knowledge-Based Systems,2014,71:366-375.
[4] BIFET A,HOLMES G,KIRKBY R,et al.MOA:Massive online analysis[J].Journal of Machine Learning Research,2010, 99:1601-1604.
[5] BIFET A,READ J,PFAHRINGER B,et al.Efficient datastream classification via probabilistic adaptive windows[C]∥Proceedings of the 28th Annual ACM Symposium on Applied Computing.New York:ACM,2013:801-806.
[6] DOMINGOS P,PAZZANI M.On the optimality of the simple Bayesian classifier under zero-one Loss [J].Machine Learning,1997,29(2):103-130.
[7] FAN H J,RAMAMOHANARAO K.A Bayesian approach to use emerging patterns for classification[C]∥Proceedings of the 14th Australasian Database Conference.Darlinghurst,Australia:Australian Computer Society,2003:39-48.
[8] FAYYAD U M,IRANI K B.Multi-interval discretization ofcontinuous-valued attributes for classification learning[C]∥Proceedings of the 13th Int’l Joint Conf.Artificial Intelligence.1993:1022-1027.
[9] FRIEDMAN N,GOLDSZMIDT M.Building classifiers using Bay-esian networks[C]∥Proceedings of The Thirteenth NationalConference on Artificial Intelligence.1996:1277-1284.
[10] GAMA J,SEBASTIAO R,RODRIGUES P P.Issues in evaluation of stream learning algorithms[C]∥Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2009:329-338.
[11] GAMA J,KOSINA P.Learning decision rules from data streams[C]∥Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence.2010:1255-1260.
[12] HULTEN G,SPENCER L,DOMINGOS P.Mining time-chan-ging data streams[C]∥Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2001:97-106.
[13] IKONOMOVSKA E,GAMA J,DEROSKI S.Learning model trees from evolving data streams[J].Data Mining and Know-ledge Discovery,2011,23(1):128-168.
[14] KEOGH E J,PAZZANI M J.Learning augmented Bayesianclassifiers:A comparison of distribution-based and classification-based approaches[C]∥Proceedings of the 7th Int’l Workshop on Artificial Intelligence and Statistics.San Francisco:Morgan Kaufmann Publishers,1999:225-230.
[15] LEWIS D D.Naive(Bayes) at forty:The independence assumption in information retrieval[J].Machine Learning:ECML-98,1998,1398:4-15.
[16] MCCALLUM A,NIGAM K.A comparison of event models for Naive Bayes text classification[C]∥Proceedings of AAAI-98 Workshop on ‘Learning for Text Categorization’,1998.
[17] LI H F,SHAN M K,LEE S Y.DSM-FI:An efficient algorithm for mining frequent itemsets in data streams[J].Knowledge and Information Systems,2008,17(1):79-97.
[18] MERETAKIS D,WIJTHRICH B.Extending Naive Bayes classifiers using long itemsets[C]∥Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining.New York:ACM,2014:165-174.
[19] PFAHRINGER B,HOLMES G,KIRKBY R.New options forhoeffding trees[C]∥Procee-dings of AI 2007:Advances in Artificial Intelligence.Heidelberg,Berlin:Springer-Verlag,2007:90-99.
[20] STREET W N,KIM Y S.A streaming ensemble algorithm(SEA) for large-scale classification[C]∥Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,New York,2001:377-382.
[21] SCHLIMMER J C,GRANGER R H.Incremental learning from noisy data[J].Machine Learning,1986,1(3):317-354.
[22] ZHANG P,GAO B J,ZHU X Q,et al.Enabling fast lazy lear-ning for data streams[C]∥Proceedings of 2011 IEEE 11th International Conference on Data Mining.New Jersey,USA:IEEE,2011:933-941.
[23] ZLIOBAITE I,BIFET A,PFAHRINGER B,et al.Active lear-ning with evolving streaming data[J].Lecture Note in Computer Science,2011,6913:597-612.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!