计算机科学 ›› 2016, Vol. 43 ›› Issue (12): 179-182.doi: 10.11896/j.issn.1002-137X.2016.12.032
张玉红,陈伟,胡学钢
ZHANG Yu-hong, CHEN Wei and HU Xue-gang
摘要: 现实生活中网络监控、网络评论以及微博等应用领域涌现了大量文本数据流,这些数据的不完全标记和频繁概念漂移给已有的数据流分类方法带来了挑战。为此,面向不完全标记的文本数据流提出了一种自适应的数据流分类算法。该算法以一个标记数据块作为起始数据块,对未标记数据块首先提取标记数据块与未标记数据块之间的特征集,并利用特征在两个数据块间的相似度进行概念漂移检测,最后计算未标记数据中特征的极性并对数据进行预测。实验表明了算法在分类精度上的优越性,尤其在标记信息较少和概念漂移较为频繁时。
[1] Domingos P,Hulten G.Mining high-speed data streams[C]∥Proceedings of the Sixth ACM SIGKDD International Confe-rence on Knowledge Discovery and Data Mining,2000.New York,NY,USA:ACM,2000:71-80 [2] Gama J,Medas P,Rocha R.Forest Trees for On-line Data[C]∥Proceedings of the 2004 ACM Symposium on Applied Computing,2004.New York,NY,USA:ACM,2004:632-636 [3] Wang H,Fan W,Yu P S,et al.Mining concept-drifting data streams using ensemble classifiers[C]∥Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2003.New York,NY,USA:ACM,2003:226-235 [4] Zhou Z H,Li M.Tri-training:Exploiting unlabeled data using three classifiers[J].IEEE Transactions on Knowledge and Data Engineering,2005,7(11):1529-1541 [5] Zhang P,Zhu X,Tan J,et al.Classifier and cluster ensembles for mining concept Drifting data streams[C]∥Proceedings of IEEE International Conference on Data Mining,2010.Washington,DC,USA:IEEE Computer Society,2010:1175-1180 [6] Hoeffding W.Probability inequalities for sums of bounded random variables[J].Journal of the American Statistical Association,1963,8(301):13-30 [7] Hulten G,Spencer L,Domingos P.Mining time-changing datastreams[C]∥Proceedings of the Seventh ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining,2001.New York,NY,USA:ACM,2001:97-106 [8] Rutkowski L,Jaworski M,Pietruczuk L,et al.A New Method for Data Stream Mining Based on the Misclassification Error[J].IEEE Transactions on Neural Networks and Learning Systems,2015,6(5):1048-1059 [9] Gama J.Learning Decision Trees from Dynamic Data Streams[J].Journal of Universal Computer Science,2005,1(8):1353-1366 [10] Mena Torres D,Aguilar Ruiz J S.A similarity-based approach for data stream classification[J].Expert Systems with Applications,2014,41(9):4224-4234 [11] Gama J,Fernandes R,Rocha R.Decision Trees for Mining Data Streams[J].Intelligent Data Analysis,2006,0(1):23-45 [12] Andromeda T,Marsono M N,Ru L H.Online Data StreamLearning and Classification with Limited Labels[C]∥Procee-ding of International Conference on Electrical Engineering,Computer Science and Informatics,2014.Yogyakarta,Indonesia:Indonesia journals,2014:161-164 [13] Widyantoro D H.Exploiting Unlabeled Data in Concept DriftLearning[J].Jurnal Informatika,2007,8(1):54-62 [14] Lindstrom P,Delany S J,B M Namee.Handling Concept Drift in a Text Data Stream Constrained by High Labelling Cost[C]∥Proceedings of the 23rd International Florida Artificial Intelligence Research Society Conference,2010.Florida,USA:AAAI,2010:32-37 [15] Masud M M,Gao J,Khan L,et al.Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints[J].IEEE Transactions on Knowledge and Data Engineering,2011,3(6):859-874 [16] Xiao M,Guo Y.Semi-Supervised Kernel Matching for Domain Adaptation[C]∥Proceedings of the 26th AAAI Conference on Artificial Intelligence,2012.North America:AAAI,2012:1183-1189 [17] Kobayashi N,Inui K,Matsumoto Y.Extracting Aspect-Evaluation and Aspect-of Relations in Opinion Mining[C]∥Procee-dings of the 2007 Joint Conference on Empirical Methods in Na-tural Language Processing and Computational Natural Language Learning,2007.Prague:Association for Computational Linguistics,2007:1065-1074 [18] Li L H,Jin X M,Long M S.Topic Correlation Analysis for Cross-Domain Text Classification[C]∥Proceedings of the 26th AAAI Conference on Artificial Intelligence,2012.North America:AAAI,2012:998-1004 [19] Blitzer J,McDonald R,Pereira F.Domain adaptation with structural correspondence learning[C]∥Proceedings of the Confe-rence on Empirical Methods in Natural Language,2006.Stroudsburg,PA,USA:Association for Computational Linguistics,2006:120-128 |
No related articles found! |
|