Computer Science ›› 2016, Vol. 43 ›› Issue (12): 179-182.doi: 10.11896/j.issn.1002-137X.2016.12.032

Previous Articles     Next Articles

Self-adaptation Classification for Incomplete Labeled Text Data Stream

ZHANG Yu-hong, CHEN Wei and HU Xue-gang   

  • Online:2018-12-01 Published:2018-12-01

Abstract: In the real-world applications,a large number of text data stream are emerging,such as network monitoring,network comments and microblogs.However,these data have incomplete labels and frequent concept drifts,which have brought many challenges to existing classification methods of data stream.Thus we proposed a self-adaptation classification algorithm for incomplete labeled text data stream in this paper.The proposed algorithm uses a labeled data chunk as the starting one,and extracts features between the labeled data chunk and the unlabeled data chunk.Meanwhile,for unlabeled data chunks,it uses the similarity of features between two data chunks to test concept drift.Finally, the polari-ty of features of the unlabeled data chunks is calculated to predict the instances.The experimental results show our algorithm can improve the classification accuracy,especially in the data cases with less label information and more concepts drifts.

Key words: Incomplete labeled,Self-adaptation,Data stream,Concept drift

[1] Domingos P,Hulten G.Mining high-speed data streams[C]∥Proceedings of the Sixth ACM SIGKDD International Confe-rence on Knowledge Discovery and Data Mining,2000.New York,NY,USA:ACM,2000:71-80
[2] Gama J,Medas P,Rocha R.Forest Trees for On-line Data[C]∥Proceedings of the 2004 ACM Symposium on Applied Computing,2004.New York,NY,USA:ACM,2004:632-636
[3] Wang H,Fan W,Yu P S,et al.Mining concept-drifting data streams using ensemble classifiers[C]∥Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2003.New York,NY,USA:ACM,2003:226-235
[4] Zhou Z H,Li M.Tri-training:Exploiting unlabeled data using three classifiers[J].IEEE Transactions on Knowledge and Data Engineering,2005,7(11):1529-1541
[5] Zhang P,Zhu X,Tan J,et al.Classifier and cluster ensembles for mining concept Drifting data streams[C]∥Proceedings of IEEE International Conference on Data Mining,2010.Washington,DC,USA:IEEE Computer Society,2010:1175-1180
[6] Hoeffding W.Probability inequalities for sums of bounded random variables[J].Journal of the American Statistical Association,1963,8(301):13-30
[7] Hulten G,Spencer L,Domingos P.Mining time-changing datastreams[C]∥Proceedings of the Seventh ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining,2001.New York,NY,USA:ACM,2001:97-106
[8] Rutkowski L,Jaworski M,Pietruczuk L,et al.A New Method for Data Stream Mining Based on the Misclassification Error[J].IEEE Transactions on Neural Networks and Learning Systems,2015,6(5):1048-1059
[9] Gama J.Learning Decision Trees from Dynamic Data Streams[J].Journal of Universal Computer Science,2005,1(8):1353-1366
[10] Mena Torres D,Aguilar Ruiz J S.A similarity-based approach for data stream classification[J].Expert Systems with Applications,2014,41(9):4224-4234
[11] Gama J,Fernandes R,Rocha R.Decision Trees for Mining Data Streams[J].Intelligent Data Analysis,2006,0(1):23-45
[12] Andromeda T,Marsono M N,Ru L H.Online Data StreamLearning and Classification with Limited Labels[C]∥Procee-ding of International Conference on Electrical Engineering,Computer Science and Informatics,2014.Yogyakarta,Indonesia:Indonesia journals,2014:161-164
[13] Widyantoro D H.Exploiting Unlabeled Data in Concept DriftLearning[J].Jurnal Informatika,2007,8(1):54-62
[14] Lindstrom P,Delany S J,B M Namee.Handling Concept Drift in a Text Data Stream Constrained by High Labelling Cost[C]∥Proceedings of the 23rd International Florida Artificial Intelligence Research Society Conference,2010.Florida,USA:AAAI,2010:32-37
[15] Masud M M,Gao J,Khan L,et al.Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints[J].IEEE Transactions on Knowledge and Data Engineering,2011,3(6):859-874
[16] Xiao M,Guo Y.Semi-Supervised Kernel Matching for Domain Adaptation[C]∥Proceedings of the 26th AAAI Conference on Artificial Intelligence,2012.North America:AAAI,2012:1183-1189
[17] Kobayashi N,Inui K,Matsumoto Y.Extracting Aspect-Evaluation and Aspect-of Relations in Opinion Mining[C]∥Procee-dings of the 2007 Joint Conference on Empirical Methods in Na-tural Language Processing and Computational Natural Language Learning,2007.Prague:Association for Computational Linguistics,2007:1065-1074
[18] Li L H,Jin X M,Long M S.Topic Correlation Analysis for Cross-Domain Text Classification[C]∥Proceedings of the 26th AAAI Conference on Artificial Intelligence,2012.North America:AAAI,2012:998-1004
[19] Blitzer J,McDonald R,Pereira F.Domain adaptation with structural correspondence learning[C]∥Proceedings of the Confe-rence on Empirical Methods in Natural Language,2006.Stroudsburg,PA,USA:Association for Computational Linguistics,2006:120-128

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!