计算机科学 ›› 2015, Vol. 42 ›› Issue (8): 60-64.

• 2014’江苏省人工智能学术会议 • 上一篇    下一篇

基于概率相关性的多标签数据流变化检测

石中伟,文益民   

  1. 桂林电子科技大学计算机科学与工程学院 桂林541004,桂林电子科技大学计算机科学与工程学院 桂林541004;广西可信软件重点实验室 桂林541004
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金项目:基于多任务学习的复杂概念漂移数据流分类研究(61363029),广西可信软件重点实验室项目:基于多信息的旅游线路智能推荐系统(KX201311)资助

Detection of Multi-label Data Streams Change Based on Probability of Relevance

SHI Zhong-wei and WEN Yi-min   

  • Online:2018-11-14 Published:2018-11-14

摘要: 由于传统的概念漂移检测研究主要针对单标签数据流,对现实中常见的多标签数据流却缺乏足够的关注,多标签数据流概念漂移检测问题有待进一步的研究。因此,通过分析多标签数据流中存在的特殊依赖关系,提出了一种基于概率相关性的多标签数据流概念漂移检测算法。其基本思想是从概念漂移的产生原因出发,利用概率相关性近似描述数据分布来监测新旧数据分布变化,判断概念漂移是否发生。实验结果表明,提出的算法能够比较快速、准确地检测到概念漂移,并在多标签概念漂移数据流分类问题上取得了预期的学习效果。

关键词: 概念漂移,多标签,数据流,概率相关性,分类

Abstract: Traditional detection approaches of concept drift mainly focus on single-label scenarios,however,not enough attention has been paid to the problem of mining from multi-label data streams.But applications of such data streams are common in the real world.These make it necessary to design efficient algorithms to detect concept drift for multi-label data streams.So after particularly analyzing the unique property label dependence of multi-label data streams,the paper proposed an algorithm of detecting concept drift based on the probability of relevance for multi-label data streams.The basic idea originates from the reason of concept drift and it describes the distribution of data streams by using the probability of relevance.Then,it estimates whether the concept drift occurs or not through monitoring the change of distribution between the old data and new data.The final experimental results show that the proposed algorithm can rapidly and accurately detect the concept drift and achieve prospective predictive performance for multi-label evolving stream classification.

Key words: Concept drift,Multi-label,Data streams,Probability of relevance,Classification

[1] Qu W,Zhang Y,Zhu J P,et al.Mining Multi-label Concept-Drif-ting Streams Using Ensemble Classifiers[C]∥Proceeding of Fuzzy Systems and Knowledge Discovery.Tianjin,China,2009,5:275-279
[2] Cheng W,Hüllermeier E,Dembczynski K J.Bayes optimal multilabel classification via probabilistic classifier chains[C]∥Proceedings of the 27th international conference on machine lear-ning.Haifa,Israel,2010:279-286
[3] Xioufis E S,Spiliopoulou M,Tsoumakas G,et al.Dealing withconcept drift and class imbalance in multi-label stream classification[C]∥Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence.Barcelona,Spain,2011,2:1583-1588
[4] Kong X,Yu P S.An ensemble-based approach to fast classification of multi-label data streams[C]∥Proceeding of 7th the International Conference on Collaborative Computing:Networking,Applications and Worksharing(CollaborateCom).Orlando,Florida,USA,2011:95-104
[5] Read J,Bifet A,Holmes G,et al.Scalable and efficient multi-label classification for evolving data streams[J].Machine Lear-ning,2012,88(1/2):243-272
[6] Cheng W,Hüllermeier E.A Simple Instance-Based Approach to Multilabel Classification Using the Mallows Model[C]∥ECML/PKKD Workshop on Learning from Multi-label Data.Bled,Slovenia,2009:28-38
[7] Hüllermeier E,Fürnkranz J,Cheng W,et al.Label ranking by learning pairwise preferences[J].Artificial Intelligence,2008,172(16):1897-1916
[8] Bifet A,Gavalda R.Learning from Time-Changing Data with Adaptive Windowing[C]∥Proceeding of the SIAM International Conference on Data Mining.Minneapolis,Minnesota,USA,2007,7:443-448
[9] Kelly M,Hand D,Adams N.The impact of changing populations on classifier performance[C]∥Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining.San Diego,CA,USA,1999:367-371
[10] Read J,Pfahringer B,Holmes G.Generating synthetic multi-label data streams[C]∥ECML/PKKD Workshop on Learning from Multi-label Data.Bled,Slovenia,2009:69-84
[11] Domingos P,Hulten G.Mining high-speed data streams[C]∥Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining.New York,USA,2000:71-80
[12] Read J,Bifet A,Holmes G,et al.Efficient multi-label classification for evolving data streams[R].University of Waikato,2010
[13] Gama J,Sebastio R,Rodrigues P P.Issues in evaluation ofstream learning algorithms[C]∥Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,USA,2009:329-338

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!