Computer Science ›› 2016, Vol. 43 ›› Issue (12): 24-29, 62.doi: 10.11896/j.issn.1002-137X.2016.12.004

Previous Articles     Next Articles

Review of Concept Drift Data Streams Mining Techniques

DING Jian, HAN Meng and LI Juan   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Data stream is a new data model proposed in recent years.It has different characteristics such as dynamic,infinite,high dimensional,orderly,high speed and evolving.In some data stream applications,the information embedded in the data is evolving over time that has the characteristics of concept drift or change.These data streams are known as evolving data streams or concept drift data streams.Therefore,the algorithms that mine data streams have space and time restrictions,and need to adapt change automatically.In this paper,we provided the survey of concept drift and classification,clustering and pattern mining on concept drift data streams.Firstly,we introduced the types and detection methods about concept drift.In order to deal with the concept drift,the sliding window model is used to mining data stream.The data stream classification model includes single model and ensemble model.The common methods include decision tree,classification association rules and so on.Data stream clustering methods can be divided into k-means based method and not.Pattern mining can provide useful patterns for classification,clustering,association rules and so on.Patterns include frequent patterns,sequential patterns,episode,sub-tree,sub-graph,high utility patterns and so on.Finally,we introduced the frequent patterns and high utility patterns in detail.

Key words: Data stream mining,Classification,Clustering,Frequent pattern mining,Concept drift

[1] Cheng J,Ke Y,Ng W.A survey on algorithms for mining frequent itemsets over data streams[J].Knowledge and Information Systems,2008,16(1):1-27
[2] Gama J,Kosina P.Recurrent concepts in data streams classification[J].Knowledge and Information Systems,2014,40(3):489-507
[3] Klinkenberg R.Learning drifting concepts:example selectionvs.example weighting[J].Intelligence Data Analysis,2004,8(3):281-300
[4] Kosina P,Gama J.Very fast decision rules for classification in data streams[J].Data Mining and Knowledge Discovery,2015,29(1):168-202
[5] Gama J,Medas P,Castillo G,et al.Learning with drift detection[C]∥Proceedings of the 17th Brazilian Symposium on Artificial Intelligence.Berlin,Germany:Springer-Verlag,2004:286-295
[6] Baena G M,Campo A J,Fidalgo R,et al.Early drift detection method[C]∥Proceedings of the Fourth International Workshop on Knowledge Discovery from Data Streams.Berlin,Germany,2006:77-86
[7] Gama J,Zliobaite I,Bife A,et al.A survey on concept drift adaptation[J].ACM Computing Surveys,2014,46(4):1-37
[8] Ikonomovska E,Gama J,Dzeroski S.Learning model trees from evolving data streams[J].Data Mining Knowledge Discovery,2011,23(1):128-168
[9] Gomes J B,Menasalvas E,Sousa P A C.Learning recurring concepts from data streams with a context-aware ensemble[C]∥Proceedings of the 26th Annual ACM Symposium on Applied Computing.New York,United States:Association for Computing Machinery,2011:994-999
[10] Bifet A,Gavaldá R.Learning from time-changing data withadaptive windowing[C]∥Proceedings of the Seventh SIAM International Conference on Data Mining,Minnesota,USA,2007.Philadelphia,United States:Society for Industrial and Applied Mathematics,2007:443-448
[11] Ghazikhani A,Monsefi R,Yazdi H S.Ensemble of online neural networks for non-stationary and imbalanced data streams[J].Neurocomputing,2013,122:535-544
[12] Cao K,Wang G,Han D,et al.An algorithm for classificationover uncertain data based on extreme learning machine[J].Neurocomputing,2016,174(Part A):194-202
[13] Cervantes J,Lamont F G,Chau A L,et al.Data selection based on decision tree for SVM classification on large data sets[J].Applied Soft Computing,2015,37:787-798
[14] Kranjc J,Smailovi′ J,Podpeˇan V,et al.Active learning for sentiment analysis on data streams:Methodology and workflow implementation in the ClowdFlows platform[J].Information Processing & Management,2015,51(2):187-203
[15] Wang P,Wu X C,Wang C,et al.CAPE-A classification algorithm using frequent patterns over data streams[J].Journal of Computer Research and Development,2004,1(10):1677-1683(in Chinese) 王鹏,吴晓晨,王晨,等.CAPE-数据流上的基于频繁模式的分类算法[J].计算机研究与发展,2004,1(10):1677-1683
[16] Ao F J,Wang T,Liu B H,et al.CBC-DS:A classification algorithm based on closed frequent patterns for mining data streams[J].Journal of Computer Research and Development,2009,6(5):779-786(in Chinese) 敖富江,王涛,刘宝宏,等.CBC-DS:基于频繁闭模式的数据流分类算法[J].计算机研究与发展,2009,6(5):779-786
[17] okplnar S,Gündem T I.Positive and negative association rule mining on XML data streams in database as a service concept[J].Expert Systems with Applications,2012,39(8):7503-7511
[18] Ari I,Olmezogullari E,Celebi O F.Data stream analytics and mining in the cloud[C]∥Proceedings of the 4th IEEE International Conference on Cloud Computing Technology and Science.Los Alamitos,USA:IEEE,2012:857-862
[19] Almeida E,Ferreira C,Gama J.Adaptive model rules from data streams[C]∥Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases.Berlin,Germany:Springer-Verlag,2013:480-492
[20] Antonelli M,Ducange P,Marcelloni F,et al.A novel associative classification model based on a fuzzy frequent pattern mining algorithm[J].Expert Systems with Applications,2015,42(4):2086-2097
[21] Bechini A,Marcelloni F,Segatori A.A MapReduce solution for associative classification of big data[J].Information Sciences,2016,2(c):33-55
[22] Kotsiantis S B.Decision trees:a recent overview[J].Artificial Intelligence Review,2013,39(4):261-283
[23] Domingos P,Hulten G.Mining high-speed data streams[C]∥Proceedings of the Sixth ACM International Conference on Knowledge Discovery and Data Mining.New York,USA:ACM,2000:71-80
[24] Hulten G,Spencer L,Domingos P.Mining time-changing datastreams[C]∥Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,USA:ACM,2001:97-106
[25] Gama J,Rocha R,Medas P.Accurate decision trees for mining high-speed data streams[C]∥Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mi-ning.New York,United States:Association for Computing Machinery,2003:523-528
[26] Fidalgo-Merino R,Nunez M.Self-adaptive induction of regression trees[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(8):1659-1672
[27] Shaker A,Senge R,Hüllermeier E.Evolving fuzzy pattern trees for binary classification on data streams[J].Information Sciences,2013,220:34-45
[28] Li B,Zhu X,Chi L,et al.Nested subtree hash kernels for large-scale graph classification over streams[C]∥Proceedings of IEEE 12th International Conference on Data Mining.Pisca-taway,United States:Institute of Electrical and Electronics Engineers Inc.,2012:399-408
[29] Pfahringer B,Holmes G,Kirkby R.New options for hoeffding trees[C]∥Proceedings of the 20th Australian Joint Conference on Artificial Intelligence.Heidelberg,Germany:Springer Verlag,2007:90-99
[30] Bifet A,Gavaldá R.Adaptive learning from evolving data st-reams [C]∥Proceedings of the 8th International Symposium on Intelligent Data Analysis.Berlin,Germany:Springer-Verlag,2009:246-260
[31] Bifet A,Holmes G,Pfahringer B.New ensemble methods forevolving data streams[C]∥Proceedings of the 15th ACM International Conference on Knowledge Discovery and Data Mining.New York,United States:Association for Computing Machinery,2009:139-148
[32] Grossi V,Turini F.Stream mining:a novel architecture for ensemble-based classification[J].Knowledge and Information Systems,2012,30(2):247-281
[33] Farid D M,Zhang L,Hossain A,et al.An adaptive ensemble classifier for mining concept-drifting data streams[J].Expert Systems with Applications,2013,40(15):5895-5906
[34] Brzezinski D,Stefanowski J.Combining block-based and online methods in learning ensembles from concept drifting data streams[J].Information Sciences,2014,265(5):50-67
[35] Czarnowski I,Jdrzejowicz P.Ensemble Classifier for MiningData Streams[J].Procedia Computer Science,2014,35:397-406
[36] Ikonomovska E,Gama J,Dzˇeroski S.Online tree-based ensembles and option trees for regression on evolving data streams[J].Neurocomputing,2015,150(Part B):458-470
[37] Abdallah Z S,Gaber M M,Srinivasan B.Adaptive mobile activity recognition system with evolving data streams[J].Neurocomputing,2015,150:304-317
[38] Hosseini M J,Gholipour A,Beigy H.An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams[J].Knowledge & Information Systems,2016,46(3):1-31
[39] ZareMoodi P,Beigy H,Siahroudi S K.Novel class detection in data streams using local patterns and neighborhood graph[J].Neurocomputing,2015,158:234-245
[40] Silva J,Faria E R,Barros R C,et al.Data stream clustering:A survey[J].ACM Computing Surveys,2013,46(1):125-134
[41] Aggarwal C C,Han J,Wang J,et al.A framework for clustering evolving data streams[C]∥Proceedings of the 29th Conference on Very Large Data Bases.Berlin,Germany,2003,29:81-92
[42] Gama J,Rodrigues P P,Lopes L.Clustering distributed sensor data streams using local processing and reduced communication[J].Intelligent Data Analysis,2011,15(1):3-28
[43] Chen Y,Tu L.Density-based clustering for real-time stream data[C]∥Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,United States:Association for Computing Machinery,2007:133-142
[44] Kranen P,Assent I,Baldauf C,et al.The clustree:indexing microclusters for anytime stream mining[J].Knowledge and Information Systems,2011,29(2):249-272
[45] Ackermann M R,Mrtens M,Raupach C,et al.StreamKM++:A clustering algorithm for data streams[J].Journal of Experimental Algorithmics,2010,17(7):173-187
[46] Li C W,Jea K F.An adaptive approximation method to discover frequent itemsets over sliding-window- based data streams[J].Expert Systems with Applications,2011,38(10):13386-13404
[47] Li G H,Chen H.Mining the frequent patterns in an arbitrary sliding window over online data streams[J].Journal of Software,2008,19(19):2585-2596(in Chinese) 李国徽,陈辉.挖掘数据流任意滑动时间窗口内频繁模式[J].软件学报,2008,9(19):2585-2596
[48] Deypir M,Sadreddini M H,Hashemi S.Towards a variable size sliding window model for frequent itemset mining over data streams[J].Computer & Industrial Engineering,2012,3(1):161-172
[49] Li C W,Jea K F.An approach of support approximation to discover frequent patterns from concept-drifting data streams based on concept learning[J].Knowledge and Information Systems,2014,40(3):639-671
[50] Farzanyar Z,Kangavari M,Cercone N.Max- FISM:Mining (recently) maximal frequent itemsets over data streams using the sliding window model[J].Computers and Mathematics with Applications,2012,64:1706-1718
[51] Li H F,Zhang N.Approximate maximal frequent itemset mining over data stream[J].Journal of Information and Computational Science,2011,8(12):2249-2257
[52] Li J,Gong S.Top-k-FCI:Mining top-k frequent closed itemsets in data streams[J].Journal of Computational Information Systems,2011,7(13):4819-4826
[53] Tsai C Y,Liou J J H,Chen C J,et al.Generating touring path suggestions using time-interval sequential pattern mining[J].Expert Systems with Applications,2010,37:6968-6973
[54] Chi Y,Wang H X,Yu P S,et al.Catch the moment:maintaining closed frequent itemsets over a data stream sliding window[J].Knowledge and Information Systems,2006,10(3):265-294
[55] Nori F,Deypir M,Sadreddini M H.A sliding window based algorithm for frequent closed itemset mining over data streams[J].Journal of Systems and Software,2013,86(3):615-623
[56] Cheng J,Ke Y,Ng W.Maintaining frequent closed itemsets over a sliding window[J].Journal of Intelligent Information Systems,2008,31(3):191-215
[57] Yen S J,Wu C W,Lee Y S,et al.A fast algorithm for mining frequent closed itemsets over stream sliding window[C]∥Proceedings of 2011 IEEE International Conference on Fuzzy Systems.Taipei,Taiwan,2011:996-1002
[58] Han M,Wang Z H,Yuan J D.Efficient method for miningclosed frequent patterns from data streams based on time decay model[J].Chinese Journal of Computers,2015,8(7):1473-1483(in Chinese) 韩萌,王志海,原继东.一种基于时间衰减模型的数据流闭合模式挖掘方法[J].计算机学报,2015,8(7):1473-1483
[59] Ahmed C F,Tanbeer S K,Jeong B S,et al.Efficient tree structures for high-utility pattern mining in incremental databases[J].IEEE Transactions on Knowledge and Data Engineering,2009,21(12):1708-1721
[60] Liu Y,Liao W,Choudhary A.A two-phase algorithm for fastdiscovery of high utility itemsets[C]∥Proceedings of PAKDD.2005:689-695
[61] Yao H,Hamilton H J,Geng L.A unified framework for utility-based measures for mining itemsets[C]∥Proceedings of ACM SIGKDD(USA).2006:28-37
[62] Tseng V S,Wu C W,Shie B E,et al.UP- Growth:An efficient algorithm for high utility itemsets mining[C]∥Proceedings of the 16th ACM SIGKDD.Washington,USA,2010:253-262
[63] Liu M,Qu J.Mining high utility itemsets without candidate ge-neration[C]∥Proceedings of CIKM12.2012:55-64
[64] Tseng V S,Chu C J,Liang T.Efficient mining of temporal high utility itemsets from data streams[J].Information and Software Technology,2006,48(6):357-369
[65] Li H F,Huang H Y,Chen Y C,et al.Fast and memory efficient mining of high utility itemsets in data streams[C]∥Proceedings of the 8th ICDM.2008:881-886
[66] Shie B E,Yu P S,Tseng V S.Efficient algorithms for miningmaximal high utility itemsets from data streams with different models[J].Expert Systems with Applications,2012,39:12947-12960
[67] Zihayat M,An A.Mining top-k high utility patterns over data streams[J].Information Sciences,2014,285(1):138-161
[68] Tseng V S,Wu C W,Fournier-Viger P,et al.Efficient algo-rithms for mining the concise and lossless representation of high utility itemsets[J].IEEE Transactions on Knowledge and Data Engineering,2015,27(3):726-739

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[2] XIA Qing-xun and ZHUANG Yi. Remote Attestation Mechanism Based on Locality Principle[J]. Computer Science, 2018, 45(4): 148 -151, 162 .
[3] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree[J]. Computer Science, 2018, 45(4): 157 -162 .
[4] WANG Huan, ZHANG Yun-feng and ZHANG Yan. Rapid Decision Method for Repairing Sequence Based on CFDs[J]. Computer Science, 2018, 45(3): 311 -316 .
[5] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[6] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[7] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[8] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics[J]. Computer Science, 2018, 45(4): 169 -172 .
[9] ZHONG Fei and YANG Bin. License Plate Detection Based on Principal Component Analysis Network[J]. Computer Science, 2018, 45(3): 268 -273 .
[10] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99, 116 .