计算机科学 ›› 2025, Vol. 52 ›› Issue (6): 139-150.doi: 10.11896/jsjkx.240300155
张帅, 周鹏, 张燕平
ZHANG Shuai, ZHOU Peng, ZHANG Yanping
摘要: 随着数据体量的剧增,机器学习方法已逐渐由传统的静态学习模式转向面向流式数据的在线学习模式。任意数据流是指数据实例随着时间以流的方式逐个到达的同时,其特征空间可能会发生任意变化,即旧的特征可能随时消失,新的特征也可能随时出现。例如,在环境检测领域,新增传感器或旧传感器突然异常会使得数据流的特征空间发生任意变化。此外,现有面向数据流的在线学习方法大多假设可以获取所有数据实例的真实标签。然而,在真实应用中,由于人工标注数据的代价高昂,数据标签大多是稀疏的。为了解决标签稀疏场景下任意数据流的在线学习问题,提出一种基于被动-主动学习的在线学习算法PAACDS(Passive Aggressive Active Learning for Capricious Data Streams)以及它的变体PAACDS-I。首先,利用在线主动学习方法选择有价值的数据实例,使得可以在最小的监督下建立优越的预测模型。随后,在获得所选择数据实例的查询标签后,结合在线被动-主动更新规则和边界最大化原则来更新基于任意数据流中共享和新增特征空间的动态分类器。最后,将所提算法与现有的最先进方法在12个数据集上进行了比较,大量的实验对比和分析验证了所提算法在任意数据流标签稀疏场景下的有效性。
中图分类号:
[1]ZHAO P,WANG D,WU P,et al.A unified framework forsparse online learning[J].ACM Transactions on Knowledge Discovery from Data(TKDD),2020,14(5):1-20. [2]ZHAO Q L,JIANG Y H.Online Data Stream Mining for Seriously Unbalanced Applications[J].Computer Science,2017,44(6):255-259. [3]DE LANGE M,TUYTELAARS T.Continual prototype evolution:Learning online from non-stationary data streams[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:8250-8259. [4]VIDHYA M,AJI S.Parallelized extreme learning machine foronline data classification[J].Applied Intelligence,2022,52(12):14164-14177. [5]FU X,SEO E,CLARKE J,et al.Link prediction under imperfect detection:Collaborative filtering for ecological networks[J].IEEE Transactions on Knowledge and Data Engineering,2019,33(8):3117-3128. [6]PHADKE A,KULKARNI M,BHAWALKAR P,et al.A review of machine learning methodologies for network intrusion detection[C]//2019 3rd International Conference on Computing Methodologies and Communication(ICCMC).IEEE,2019:272-275. [7]ULLO S L,SINHA G R.Advances in smart environment monitoring systems using IoT and sensors[J].Sensors,2020,20(11):3113. [8]HE Y,WU B,WU D,et al.Online learning from capricious datastreams:a generative approach[C]//International Joint Confe-rence on Artificial Intelligence Main Track.2019. [9]YOU D,XIAO J,WANG Y,et al.Online learning from incomplete and imbalanced data streams[J].IEEE Transactions on Knowledge and Data Engineering,2023,35(10):10650-10665. [10]ZHANG D,JIN M,CAO P.ST-Meta Diagnosis:Meta learningwith Spatial Transform for rare skin disease Diagnosis[C]//2020 IEEE International Conference on Bioinformatics and Biomedicine(BIBM).IEEE,2020:2153-2160. [11]ZHOU Y,REN H,LI Z,et al.Anomaly detection via a combination model in time series data[J].Applied Intelligence,2021,51:4874-4887. [12]LU J,LIU A,DONG F,et al.Learning under concept drift:A review[J].IEEE Transactions on Knowledge and Data Engineering,2018,31(12):2346-2363. [13]AGRAHARI S,SINGH A K.Concept drift detection in data stream mining:A literature review[J].Journal of King Saud University-Computer and Information Sciences,2022,34(10):9523-9540. [14]LI H,FANG C,LIN Z.Accelerated first-order optimization algorithms for machine learning[C]//Proceedings of the IEEE.2020:2067-2082. [15]ZINKEVICH M.Online convex programming and generalizedinfinitesimal gradient ascent[C]//Proceedings of the 20th International Conference on Machine Learning(ICML-03).2003:928-936. [16]CRAMMER K,LEE D.Learning via gaussian herding[C]//Proceedings of the 24th International Conference on Neural Information Processing Systems.2010:451-459. [17]CRAMMER K,DREDZE M,KULESZA A.Multi-class confidence weighted algorithms[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing.2009:496-504. [18]CHEN Z,ZHAN H,SHENG V,et al.Projection dual averaging based second-order online learning[C]//2022 IEEE InternationalConference on Data Mining(ICDM).IEEE,2022:51-60. [19]ZHANG Q,ZHANG P,LONG G,et al.Online learning from trapezoidal data streams[J].IEEE Transactions on Knowledge and Data Engineering,2016,28(10):2709-2723. [20]GU S,QIAN Y,HOU C.Learning with incremental instances and features[J].IEEE Transactions on Neural Networks and Learning Systems,2023,35(7):9713-9727. [21]YU E,LU J,ZHANG B,et al.Online boosting adaptive learning under concept drift for multistream classification[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2024:16522-16530. [22]BEYAZIT E,ALAGURAJAH J,WU X.Online learning from data streams with varying feature spaces[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:3232-3239. [23]HE Y,WU B,WU D,et al.Toward mining capricious datastreams:A generative approach[J].IEEE Transactions on Neural Networks and Learning Systems,2020,32(3):1228-1240. [24]GU S,QIAN Y,HOU C.Incremental feature spaces learning with label scarcity[J].ACM Transactions on Knowledge Discovery from Data(TKDD),2022,16(6):1-26. [25]LIU Y,FAN X,LI W,et al.Online passive-aggressive active learning for trapezoidal data streams[J].IEEE Transactions on Neural Networks and Learning Systems,2022,34(10):6725-6739. [26]CHENG J,ZHENG Z,GUO Y,et al.Active broad learning with multi-objective evolution for data stream classification[J].Complex & Intelligent Systems,2024,10(1):899-916. [27]GU S,LUO T,HE M,et al.Online Learning With Incremental Feature Space and Bandit Feedback[J].IEEE Transactions on Knowledge and Data Engineering,2023,35(12):12902-12916. [28]DIN S U,ULLAH A,MAWULI C B,et al.A reliable adaptive prototype-based learning for evolving data streams with limited labels[J].Information Processing & Management,2024,61(1):103532. [29]HAO S,LU J,ZHAO P,et al.Second-order online active lear-ning and its applications[J].IEEE Transactions on Knowledge and Data Engineering,2017,30(7):1338-1351. [30]LIN X.Dual averaging method for regularized stochastic lear-ning and online optimization[J].The Journal of Machine Lear-ning Research,2010,11:2543-2596. |
|