Computer Science ›› 2025, Vol. 52 ›› Issue (6): 139-150.doi: 10.11896/jsjkx.240300155

• Database & Big Data & Data Science • Previous Articles     Next Articles

Online Capricious Data Stream Learning with Sparse Labels

ZHANG Shuai, ZHOU Peng, ZHANG Yanping   

  1. School of Computer Science and Technology,Anhui University,Hefei 230601,China
  • Received:2024-03-25 Revised:2024-09-11 Online:2025-06-15 Published:2025-06-11
  • About author:ZHANG Shuai,born in 1996,postgradua-te,is a student member of CCF(No.U3918G).His main research interests include data streams and online lear-ning.
    ZHOU Peng,born in 1987,Ph.D,is a member of CCF(No.K6292M).His main research interests include data mining and machine learning.
  • Supported by:
    National Natural Science Foundation of China(62376001) and Natural Science Foundation of Anhui Province,China(2308085MF215).

Abstract: With the dramatic increase in data volume,machine learning methods have gradually transitioned from traditional static learning to online learning modes that are designed for streaming data.Capricious data streams refer to data instances arriving over time in a sequential manner,where the feature space can potentially undergo capricious changes.It means that old features may disappear at any time,while new features may emerge.For example,in the field of environmental monitoring,the addition of new sensors or sudden anomalies in existing sensors can cause arbitrary changes in the feature space of the data stream.Furthermore,existing online learning methods for data streams often assume access to the true labels of all data instances.However,in real-world applications,data labeling is often sparse due to the high cost of manual annotation.Therefore,to address the problem of online learning in capricious data streams with sparse labels,a passive-active learning-based online learning algorithm called PAACDS(Passive Aggressive Active Learning for Capricious Data Streams),along with its variant PAACDS-I,is proposed.Firstly,an online active learning method is utilized to select valuable data instances,allowing the construction of superior prediction models with minimal supervision.Subsequently,after obtaining the queried labels for the selected data instances,the dynamic classifier,which encompasses the shared and newly added feature spaces in the capricious data streams,is updated using online passive-active update rules and the principle of boundary maximization.Finally,the proposed algorithm is compared to existing state-of-the-art methods on twelve datasets.Extensive experimental comparisons and analyses validate the effectiveness of the proposed algorithm in scenarios involving capricious data streams and sparse labels.

Key words: Online learning, Capricious data streams, Dynamic feature space, Active learning, Sparse label

CLC Number: 

  • TP391
[1]ZHAO P,WANG D,WU P,et al.A unified framework forsparse online learning[J].ACM Transactions on Knowledge Discovery from Data(TKDD),2020,14(5):1-20.
[2]ZHAO Q L,JIANG Y H.Online Data Stream Mining for Seriously Unbalanced Applications[J].Computer Science,2017,44(6):255-259.
[3]DE LANGE M,TUYTELAARS T.Continual prototype evolution:Learning online from non-stationary data streams[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:8250-8259.
[4]VIDHYA M,AJI S.Parallelized extreme learning machine foronline data classification[J].Applied Intelligence,2022,52(12):14164-14177.
[5]FU X,SEO E,CLARKE J,et al.Link prediction under imperfect detection:Collaborative filtering for ecological networks[J].IEEE Transactions on Knowledge and Data Engineering,2019,33(8):3117-3128.
[6]PHADKE A,KULKARNI M,BHAWALKAR P,et al.A review of machine learning methodologies for network intrusion detection[C]//2019 3rd International Conference on Computing Methodologies and Communication(ICCMC).IEEE,2019:272-275.
[7]ULLO S L,SINHA G R.Advances in smart environment monitoring systems using IoT and sensors[J].Sensors,2020,20(11):3113.
[8]HE Y,WU B,WU D,et al.Online learning from capricious datastreams:a generative approach[C]//International Joint Confe-rence on Artificial Intelligence Main Track.2019.
[9]YOU D,XIAO J,WANG Y,et al.Online learning from incomplete and imbalanced data streams[J].IEEE Transactions on Knowledge and Data Engineering,2023,35(10):10650-10665.
[10]ZHANG D,JIN M,CAO P.ST-Meta Diagnosis:Meta learningwith Spatial Transform for rare skin disease Diagnosis[C]//2020 IEEE International Conference on Bioinformatics and Biomedicine(BIBM).IEEE,2020:2153-2160.
[11]ZHOU Y,REN H,LI Z,et al.Anomaly detection via a combination model in time series data[J].Applied Intelligence,2021,51:4874-4887.
[12]LU J,LIU A,DONG F,et al.Learning under concept drift:A review[J].IEEE Transactions on Knowledge and Data Engineering,2018,31(12):2346-2363.
[13]AGRAHARI S,SINGH A K.Concept drift detection in data stream mining:A literature review[J].Journal of King Saud University-Computer and Information Sciences,2022,34(10):9523-9540.
[14]LI H,FANG C,LIN Z.Accelerated first-order optimization algorithms for machine learning[C]//Proceedings of the IEEE.2020:2067-2082.
[15]ZINKEVICH M.Online convex programming and generalizedinfinitesimal gradient ascent[C]//Proceedings of the 20th International Conference on Machine Learning(ICML-03).2003:928-936.
[16]CRAMMER K,LEE D.Learning via gaussian herding[C]//Proceedings of the 24th International Conference on Neural Information Processing Systems.2010:451-459.
[17]CRAMMER K,DREDZE M,KULESZA A.Multi-class confidence weighted algorithms[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing.2009:496-504.
[18]CHEN Z,ZHAN H,SHENG V,et al.Projection dual averaging based second-order online learning[C]//2022 IEEE InternationalConference on Data Mining(ICDM).IEEE,2022:51-60.
[19]ZHANG Q,ZHANG P,LONG G,et al.Online learning from trapezoidal data streams[J].IEEE Transactions on Knowledge and Data Engineering,2016,28(10):2709-2723.
[20]GU S,QIAN Y,HOU C.Learning with incremental instances and features[J].IEEE Transactions on Neural Networks and Learning Systems,2023,35(7):9713-9727.
[21]YU E,LU J,ZHANG B,et al.Online boosting adaptive learning under concept drift for multistream classification[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2024:16522-16530.
[22]BEYAZIT E,ALAGURAJAH J,WU X.Online learning from data streams with varying feature spaces[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:3232-3239.
[23]HE Y,WU B,WU D,et al.Toward mining capricious datastreams:A generative approach[J].IEEE Transactions on Neural Networks and Learning Systems,2020,32(3):1228-1240.
[24]GU S,QIAN Y,HOU C.Incremental feature spaces learning with label scarcity[J].ACM Transactions on Knowledge Discovery from Data(TKDD),2022,16(6):1-26.
[25]LIU Y,FAN X,LI W,et al.Online passive-aggressive active learning for trapezoidal data streams[J].IEEE Transactions on Neural Networks and Learning Systems,2022,34(10):6725-6739.
[26]CHENG J,ZHENG Z,GUO Y,et al.Active broad learning with multi-objective evolution for data stream classification[J].Complex & Intelligent Systems,2024,10(1):899-916.
[27]GU S,LUO T,HE M,et al.Online Learning With Incremental Feature Space and Bandit Feedback[J].IEEE Transactions on Knowledge and Data Engineering,2023,35(12):12902-12916.
[28]DIN S U,ULLAH A,MAWULI C B,et al.A reliable adaptive prototype-based learning for evolving data streams with limited labels[J].Information Processing & Management,2024,61(1):103532.
[29]HAO S,LU J,ZHAO P,et al.Second-order online active lear-ning and its applications[J].IEEE Transactions on Knowledge and Data Engineering,2017,30(7):1338-1351.
[30]LIN X.Dual averaging method for regularized stochastic lear-ning and online optimization[J].The Journal of Machine Lear-ning Research,2010,11:2543-2596.
[1] NING Limiao, WANG Ziming, LIN Zhicheng, PENG Jian, TANG Huajin. Learning Rule with Precise Spike Timing Based on Direct Feedback Alignment [J]. Computer Science, 2025, 52(3): 260-267.
[2] LI Yahe, XIE Zhipeng. Active Learning Based on Maximum Influence Set [J]. Computer Science, 2025, 52(1): 289-297.
[3] XING Kaiyan, CHEN Wen. Multi-generator Active Learning Algorithm Based on Reverse Label Propagation and ItsApplication in Outlier Detection [J]. Computer Science, 2024, 51(4): 359-365.
[4] GAO Mengqi, FENG Xiang, YU Huiqun, WANG Mengling. Large-scale Multi-objective Evolutionary Algorithm Based on Online Learning of Sparse Features [J]. Computer Science, 2024, 51(3): 56-62.
[5] ZHOU Shenghao, YUAN Weiwei, GUAN Donghai. Local Interpretable Model-agnostic Explanations Based on Active Learning and Rational Quadratic Kernel [J]. Computer Science, 2024, 51(2): 245-251.
[6] HUANG Chunli, LIU Guimei, JIANG Wenjun, LI Kenli, ZHANG Ji, TAK-SHING Peter Yum. Learning Pattern Recognition and Performance Prediction Method Based on Learners'Behavior Evolution [J]. Computer Science, 2024, 51(10): 67-78.
[7] QI Xuanlong, CHEN Hongyang, ZHAO Wenbing, ZHAO Di, GAO Jingyang. Study on BGA Packaging Void Rate Detection Based on Active Learning and U-Net++ Segmentation [J]. Computer Science, 2023, 50(6A): 220200092-6.
[8] QIN Liang, XIE Liang, CHEN Shengshuang, XU Haijiao. Online Semi-supervised Cross-modal Hashing Based on Anchor Graph Classification [J]. Computer Science, 2023, 50(6): 183-193.
[9] GUO Wei, HUANG Jiahui, HOU Chenyu, CAO Bin. Text Classification Method Based on Anti-noise and Double Distillation Technology [J]. Computer Science, 2023, 50(6): 251-260.
[10] XU Jie, ZHOU Xinzhi. Multi-elite Interactive Learning Based Particle Swarm Optimization Algorithm with Adaptive Bound-handling Technique [J]. Computer Science, 2023, 50(11): 210-219.
[11] DING Hongxin, ZOU Peinie, ZHAO Junfeng, WANG Yasha. Active Learning-based Text Entity and Relation Joint Extraction Method [J]. Computer Science, 2023, 50(10): 126-134.
[12] ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169.
[13] HOU Xia-ye, CHEN Hai-yan, ZHANG Bing, YUAN Li-gang, JIA Yi-zhen. Active Metric Learning Based on Support Vector Machines [J]. Computer Science, 2022, 49(6A): 113-118.
[14] WEI Yan-tao, LUO Jie-lin, HU Mei-jia, LI Wen-hao, YAO Huang. Online Learning Emotion Recognition Based on Videos [J]. Computer Science, 2022, 49(11A): 211000049-6.
[15] ZHANG Da-lin, ZHANG Zhe-wei, WANG Nan, LIU Ji-qiang. AutoUnit:Automatic Test Generation Based on Active Learning and Prediction Guidance [J]. Computer Science, 2022, 49(11): 39-48.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!