计算机科学 ›› 2016, Vol. 43 ›› Issue (12): 24-29.doi: 10.11896/j.issn.1002-137X.2016.12.004
丁剑,韩萌,李娟
DING Jian, HAN Meng and LI Juan
摘要: 数据流是一种新型的数据模型,具有动态、无限、高维、有序、高速和变化等特性。在真实的数据流环境中,一些数据分布是随着时间改变的,即具有概念漂移特征,称为可变数据流或概念漂移数据流。因此处理数据流模型的方法需要处理时空约束和自适应调整概念变化。对概念漂移问题和概念漂移数据流分类、聚类和模式挖掘等内容进行综述。首先介绍概念漂移的类型和常用概念改变检测方法。为了解决概念漂移问题,数据流挖掘中常使用滑动窗口模型对新近事务进行处理。数据流分类常用的模型包括单分类模型和集成分类模型,常用的方法包括决策树、分类关联规则等。数据流聚类方式通常包括基于k- means的和非基于k- means的。模式挖掘可以为分类、聚类和关联规则等提供有用信息。概念漂移数据流中的模式包括频繁模式、序列模式、episode、模式树、模式图和高效用模式等。最后详细介绍其中的频繁模式挖掘算法和高效用模式挖掘算法。
[1] Cheng J,Ke Y,Ng W.A survey on algorithms for mining frequent itemsets over data streams[J].Knowledge and Information Systems,2008,16(1):1-27 [2] Gama J,Kosina P.Recurrent concepts in data streams classification[J].Knowledge and Information Systems,2014,40(3):489-507 [3] Klinkenberg R.Learning drifting concepts:example selectionvs.example weighting[J].Intelligence Data Analysis,2004,8(3):281-300 [4] Kosina P,Gama J.Very fast decision rules for classification in data streams[J].Data Mining and Knowledge Discovery,2015,29(1):168-202 [5] Gama J,Medas P,Castillo G,et al.Learning with drift detection[C]∥Proceedings of the 17th Brazilian Symposium on Artificial Intelligence.Berlin,Germany:Springer-Verlag,2004:286-295 [6] Baena G M,Campo A J,Fidalgo R,et al.Early drift detection method[C]∥Proceedings of the Fourth International Workshop on Knowledge Discovery from Data Streams.Berlin,Germany,2006:77-86 [7] Gama J,Zliobaite I,Bife A,et al.A survey on concept drift adaptation[J].ACM Computing Surveys,2014,46(4):1-37 [8] Ikonomovska E,Gama J,Dzeroski S.Learning model trees from evolving data streams[J].Data Mining Knowledge Discovery,2011,23(1):128-168 [9] Gomes J B,Menasalvas E,Sousa P A C.Learning recurring concepts from data streams with a context-aware ensemble[C]∥Proceedings of the 26th Annual ACM Symposium on Applied Computing.New York,United States:Association for Computing Machinery,2011:994-999 [10] Bifet A,Gavaldá R.Learning from time-changing data withadaptive windowing[C]∥Proceedings of the Seventh SIAM International Conference on Data Mining,Minnesota,USA,2007.Philadelphia,United States:Society for Industrial and Applied Mathematics,2007:443-448 [11] Ghazikhani A,Monsefi R,Yazdi H S.Ensemble of online neural networks for non-stationary and imbalanced data streams[J].Neurocomputing,2013,122:535-544 [12] Cao K,Wang G,Han D,et al.An algorithm for classificationover uncertain data based on extreme learning machine[J].Neurocomputing,2016,174(Part A):194-202 [13] Cervantes J,Lamont F G,Chau A L,et al.Data selection based on decision tree for SVM classification on large data sets[J].Applied Soft Computing,2015,37:787-798 [14] Kranjc J,Smailovi′ J,Podpeˇan V,et al.Active learning for sentiment analysis on data streams:Methodology and workflow implementation in the ClowdFlows platform[J].Information Processing & Management,2015,51(2):187-203 [15] Wang P,Wu X C,Wang C,et al.CAPE-A classification algorithm using frequent patterns over data streams[J].Journal of Computer Research and Development,2004,1(10):1677-1683(in Chinese) 王鹏,吴晓晨,王晨,等.CAPE-数据流上的基于频繁模式的分类算法[J].计算机研究与发展,2004,1(10):1677-1683 [16] Ao F J,Wang T,Liu B H,et al.CBC-DS:A classification algorithm based on closed frequent patterns for mining data streams[J].Journal of Computer Research and Development,2009,6(5):779-786(in Chinese) 敖富江,王涛,刘宝宏,等.CBC-DS:基于频繁闭模式的数据流分类算法[J].计算机研究与发展,2009,6(5):779-786 [17] okplnar S,Gündem T I.Positive and negative association rule mining on XML data streams in database as a service concept[J].Expert Systems with Applications,2012,39(8):7503-7511 [18] Ari I,Olmezogullari E,Celebi O F.Data stream analytics and mining in the cloud[C]∥Proceedings of the 4th IEEE International Conference on Cloud Computing Technology and Science.Los Alamitos,USA:IEEE,2012:857-862 [19] Almeida E,Ferreira C,Gama J.Adaptive model rules from data streams[C]∥Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases.Berlin,Germany:Springer-Verlag,2013:480-492 [20] Antonelli M,Ducange P,Marcelloni F,et al.A novel associative classification model based on a fuzzy frequent pattern mining algorithm[J].Expert Systems with Applications,2015,42(4):2086-2097 [21] Bechini A,Marcelloni F,Segatori A.A MapReduce solution for associative classification of big data[J].Information Sciences,2016,2(c):33-55 [22] Kotsiantis S B.Decision trees:a recent overview[J].Artificial Intelligence Review,2013,39(4):261-283 [23] Domingos P,Hulten G.Mining high-speed data streams[C]∥Proceedings of the Sixth ACM International Conference on Knowledge Discovery and Data Mining.New York,USA:ACM,2000:71-80 [24] Hulten G,Spencer L,Domingos P.Mining time-changing datastreams[C]∥Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,USA:ACM,2001:97-106 [25] Gama J,Rocha R,Medas P.Accurate decision trees for mining high-speed data streams[C]∥Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mi-ning.New York,United States:Association for Computing Machinery,2003:523-528 [26] Fidalgo-Merino R,Nunez M.Self-adaptive induction of regression trees[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(8):1659-1672 [27] Shaker A,Senge R,Hüllermeier E.Evolving fuzzy pattern trees for binary classification on data streams[J].Information Sciences,2013,220:34-45 [28] Li B,Zhu X,Chi L,et al.Nested subtree hash kernels for large-scale graph classification over streams[C]∥Proceedings of IEEE 12th International Conference on Data Mining.Pisca-taway,United States:Institute of Electrical and Electronics Engineers Inc.,2012:399-408 [29] Pfahringer B,Holmes G,Kirkby R.New options for hoeffding trees[C]∥Proceedings of the 20th Australian Joint Conference on Artificial Intelligence.Heidelberg,Germany:Springer Verlag,2007:90-99 [30] Bifet A,Gavaldá R.Adaptive learning from evolving data st-reams [C]∥Proceedings of the 8th International Symposium on Intelligent Data Analysis.Berlin,Germany:Springer-Verlag,2009:246-260 [31] Bifet A,Holmes G,Pfahringer B.New ensemble methods forevolving data streams[C]∥Proceedings of the 15th ACM International Conference on Knowledge Discovery and Data Mining.New York,United States:Association for Computing Machinery,2009:139-148 [32] Grossi V,Turini F.Stream mining:a novel architecture for ensemble-based classification[J].Knowledge and Information Systems,2012,30(2):247-281 [33] Farid D M,Zhang L,Hossain A,et al.An adaptive ensemble classifier for mining concept-drifting data streams[J].Expert Systems with Applications,2013,40(15):5895-5906 [34] Brzezinski D,Stefanowski J.Combining block-based and online methods in learning ensembles from concept drifting data streams[J].Information Sciences,2014,265(5):50-67 [35] Czarnowski I,Jdrzejowicz P.Ensemble Classifier for MiningData Streams[J].Procedia Computer Science,2014,35:397-406 [36] Ikonomovska E,Gama J,Dzˇeroski S.Online tree-based ensembles and option trees for regression on evolving data streams[J].Neurocomputing,2015,150(Part B):458-470 [37] Abdallah Z S,Gaber M M,Srinivasan B.Adaptive mobile activity recognition system with evolving data streams[J].Neurocomputing,2015,150:304-317 [38] Hosseini M J,Gholipour A,Beigy H.An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams[J].Knowledge & Information Systems,2016,46(3):1-31 [39] ZareMoodi P,Beigy H,Siahroudi S K.Novel class detection in data streams using local patterns and neighborhood graph[J].Neurocomputing,2015,158:234-245 [40] Silva J,Faria E R,Barros R C,et al.Data stream clustering:A survey[J].ACM Computing Surveys,2013,46(1):125-134 [41] Aggarwal C C,Han J,Wang J,et al.A framework for clustering evolving data streams[C]∥Proceedings of the 29th Conference on Very Large Data Bases.Berlin,Germany,2003,29:81-92 [42] Gama J,Rodrigues P P,Lopes L.Clustering distributed sensor data streams using local processing and reduced communication[J].Intelligent Data Analysis,2011,15(1):3-28 [43] Chen Y,Tu L.Density-based clustering for real-time stream data[C]∥Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,United States:Association for Computing Machinery,2007:133-142 [44] Kranen P,Assent I,Baldauf C,et al.The clustree:indexing microclusters for anytime stream mining[J].Knowledge and Information Systems,2011,29(2):249-272 [45] Ackermann M R,Mrtens M,Raupach C,et al.StreamKM++:A clustering algorithm for data streams[J].Journal of Experimental Algorithmics,2010,17(7):173-187 [46] Li C W,Jea K F.An adaptive approximation method to discover frequent itemsets over sliding-window- based data streams[J].Expert Systems with Applications,2011,38(10):13386-13404 [47] Li G H,Chen H.Mining the frequent patterns in an arbitrary sliding window over online data streams[J].Journal of Software,2008,19(19):2585-2596(in Chinese) 李国徽,陈辉.挖掘数据流任意滑动时间窗口内频繁模式[J].软件学报,2008,9(19):2585-2596 [48] Deypir M,Sadreddini M H,Hashemi S.Towards a variable size sliding window model for frequent itemset mining over data streams[J].Computer & Industrial Engineering,2012,3(1):161-172 [49] Li C W,Jea K F.An approach of support approximation to discover frequent patterns from concept-drifting data streams based on concept learning[J].Knowledge and Information Systems,2014,40(3):639-671 [50] Farzanyar Z,Kangavari M,Cercone N.Max- FISM:Mining (recently) maximal frequent itemsets over data streams using the sliding window model[J].Computers and Mathematics with Applications,2012,64:1706-1718 [51] Li H F,Zhang N.Approximate maximal frequent itemset mining over data stream[J].Journal of Information and Computational Science,2011,8(12):2249-2257 [52] Li J,Gong S.Top-k-FCI:Mining top-k frequent closed itemsets in data streams[J].Journal of Computational Information Systems,2011,7(13):4819-4826 [53] Tsai C Y,Liou J J H,Chen C J,et al.Generating touring path suggestions using time-interval sequential pattern mining[J].Expert Systems with Applications,2010,37:6968-6973 [54] Chi Y,Wang H X,Yu P S,et al.Catch the moment:maintaining closed frequent itemsets over a data stream sliding window[J].Knowledge and Information Systems,2006,10(3):265-294 [55] Nori F,Deypir M,Sadreddini M H.A sliding window based algorithm for frequent closed itemset mining over data streams[J].Journal of Systems and Software,2013,86(3):615-623 [56] Cheng J,Ke Y,Ng W.Maintaining frequent closed itemsets over a sliding window[J].Journal of Intelligent Information Systems,2008,31(3):191-215 [57] Yen S J,Wu C W,Lee Y S,et al.A fast algorithm for mining frequent closed itemsets over stream sliding window[C]∥Proceedings of 2011 IEEE International Conference on Fuzzy Systems.Taipei,Taiwan,2011:996-1002 [58] Han M,Wang Z H,Yuan J D.Efficient method for miningclosed frequent patterns from data streams based on time decay model[J].Chinese Journal of Computers,2015,8(7):1473-1483(in Chinese) 韩萌,王志海,原继东.一种基于时间衰减模型的数据流闭合模式挖掘方法[J].计算机学报,2015,8(7):1473-1483 [59] Ahmed C F,Tanbeer S K,Jeong B S,et al.Efficient tree structures for high-utility pattern mining in incremental databases[J].IEEE Transactions on Knowledge and Data Engineering,2009,21(12):1708-1721 [60] Liu Y,Liao W,Choudhary A.A two-phase algorithm for fastdiscovery of high utility itemsets[C]∥Proceedings of PAKDD.2005:689-695 [61] Yao H,Hamilton H J,Geng L.A unified framework for utility-based measures for mining itemsets[C]∥Proceedings of ACM SIGKDD(USA).2006:28-37 [62] Tseng V S,Wu C W,Shie B E,et al.UP- Growth:An efficient algorithm for high utility itemsets mining[C]∥Proceedings of the 16th ACM SIGKDD.Washington,USA,2010:253-262 [63] Liu M,Qu J.Mining high utility itemsets without candidate ge-neration[C]∥Proceedings of CIKM12.2012:55-64 [64] Tseng V S,Chu C J,Liang T.Efficient mining of temporal high utility itemsets from data streams[J].Information and Software Technology,2006,48(6):357-369 [65] Li H F,Huang H Y,Chen Y C,et al.Fast and memory efficient mining of high utility itemsets in data streams[C]∥Proceedings of the 8th ICDM.2008:881-886 [66] Shie B E,Yu P S,Tseng V S.Efficient algorithms for miningmaximal high utility itemsets from data streams with different models[J].Expert Systems with Applications,2012,39:12947-12960 [67] Zihayat M,An A.Mining top-k high utility patterns over data streams[J].Information Sciences,2014,285(1):138-161 [68] Tseng V S,Wu C W,Fournier-Viger P,et al.Efficient algo-rithms for mining the concise and lossless representation of high utility itemsets[J].IEEE Transactions on Knowledge and Data Engineering,2015,27(3):726-739 |
No related articles found! |
|