Computer Science ›› 2026, Vol. 53 ›› Issue (6A): 250800062-11.doi: 10.11896/jsjkx.250800062

• Big Data & Data Science • Previous Articles     Next Articles

Torlink:High-performance Streaming ML Framework for Dynamic Flow-rate Data

LIANG Zheheng1,3, YU Ran2,4, CUI Lei1,3, QIN Zheng2,4, ZHANG Jinbo1,3, ZHANG Ziyang1,3, WU Mingchao2,4   

  1. 1 Information Center,Guangdong Power Grid Limited Liability Company,Guangzhou 510000,China
    2 Key Laboratory of System Software,Chinese Academy of Sciences,Beijing 100190,China
    3 Joint Laboratory on Cyberspace Security,China Southern Power Grid,Guangzhou 510000,China
    4 University of Chinese Academy of Sciences,Beijing 100049,China
  • Online:2026-06-16 Published:2026-06-12
  • About author:LIANG Zheheng,born in 1986,postgraduate.His main research interests include big data processing,and so on.
    QIN Zheng,born in 1997,Ph.D.His main research interests include streamingprocessing and machine learning system.
  • Supported by:
    Guangdong Power Grid Limited Liability Company(037800KC23090006) and Major Project of ISCAS(ISCAS-ZD-202302).

Abstract: With the advent of the big data era,streaming machine learning theories and methods have gained widespread attention and application.Their core lies in the ability to process continuously arriving data streams in real time and respond quickly to dynamic changes in data.The typical streaming machine learning frameworks lack support for general streaming learning algorithms and effective performance optimization mechanisms when handling dynamic data flow rates.To address these issues,this paper first analyzes and summarizes the application and computational characteristics of streaming machine learning,designing a relatively general streaming machine learning data flow.For existing frameworks,it analyzes their potential performance bottlenecks and proposes two performance optimization methods:a distance-based dynamic sampling mechanism and a gradient-based window pre-aggregation mechanism.Finally,a prototype system,Torlink,is implemented based on Flink,and experiments are conducted on four typical datasets.Results show that Torlink achieves an overall throughput approximately 4.1 times higher than existing frameworks on a 4-node cluster,with a horizontal speedup ratio of up to 3.3.

Key words: Streaming learning, Stream data, Streaming processing, Performance optimization

CLC Number: 

  • TP311
[1] SUN D W,ZHANG G Y,ZHENG W M.Big Data Stream Computing:Technologies and Instances [J].Journal of Software,2014,25(4):839-862.
[2] MONTIEL J,HALFORD M,MASTELINI S M,et al.River:machine learning for streaming data in python[J].Journal of Machine Learning Research,2021,22(110):1-8.
[3] scikit-multiflow[EB/OL].[2023-06-18] .https://scikit-multiflow.github.io/.
[4] Apache Flink Machine Learning Library[EB/OL].[2023-06-18] .https://nightlies.apache.org/flink/flink-ml-docs-release-2.2/.
[5] Alink[CP].Alibaba,2023.
[6] MORALES G D F,BIFET A.SAMOA:scalable advanced massive online analysis[J].Journal of Machine Learning Research,2015,16(1):149-153.
[7] MENG X,BRADLEY J,YAVUZ B,et al.Mllib:Machine lear-ning in apache spark[J].Journal of Machine Learning Research,2016,17(34):1-7.
[8] CARBONE P,KATSIFODIMOS A,EWEN S,et al.Apacheflink:Stream and batch processing in a single engine[J].The Bulletin of the Technical Committee on Data Engineering,2015,38(4).
[9] SCHLIMMER J,FISHERD.A Case Study of Incremental Concept Induction[C]//Proceedings of the Fifth National Confe-rence on Artificial Intelligence.1986:496-501.
[10] BORS A G,PITAS I.Introduction of the Radial Basis Function(RBF) Networks:Vol.1[M].Physica Verlag Rudolf Liebing KG,2001:1-7.
[11] POLIKAR R,UPDA L,UPDA S S,et al.Learn++:An incremental learning algorithm for supervised neural networks[J].IEEE Transactions on Systems,Man,and Cybernetics,Part C:Applications and Reviews,2001,31:497-508.
[12] TSCHEREPANOWM.TopoART:A Topology Learning Hierarchical ART Network[C]//Artificial Neural Networks-ICANN 2010.Berlin,Heidelberg:Springer,2010:157-167.
[13] LAMIREL J C,BOULILA Z,GHRIBI M,et al.A New Incremental Growing Neural Gas Algorithm Based on Clusters Labeling Maximization:Application to Clustering of Heterogeneous Textual Data[C]//GARCÍA-PEDRAJAS N,HERRERA F,FYFE C,et al.Trends in Applied Intelligent Systems.Berlin,Heidelberg:Springer,2010:139-148.
[14] AILON N,JAISWAL R,MONTELEONI C.Streaming k-means approximation[C]//International Conference on Neural Information Processing Systems.2009.
[15] DIEHL C P,CAUWENBERGHS G.Svm incremental learning,adaptation and optimization[C]//Proceedings of the International Joint Conference on Neural Networks.IEEE,2003:2685-2690.
[16] VAHIDI A,STEFANOPOULOU A,PENG H.Recursive least squares with forgetting for online estimation of vehicle mass and road grade:theory and experiments[J].Vehicle System Dynami-cs,2005,43(1):31-55.
[17] CHO Y,SAUL L.Kernel Methods for Deep Learning[C]//Advances in Neural Information Processing Systems.Curran Associates,Inc.,2009.
[18] SAHOO D,PHAM Q,LU J,et al.Online Deep Learning:Learning Deep Neural Networks on the Fly[C]//IJCAI-18.2017.
[19] LARA-BENÍTEZ P,CARRANZA-GARCÍA M,GARCÍA-GU-TIÉRREZ J,et al.Asynchronous dual-pipeline deep learning framework for online data stream classification[J].ICA,2020,27(2):101-119.
[20] WAHAB O A.Intrusion detection in the iot under data and concept drifts:Online deep learning approach[J].IEEE Internet of Things Journal,2022,9(20):19706-19716.
[21] MAO K,ZHU J,SU L,et al.FinalMLP:An Enhanced Two-Stream MLP Model for CTR Prediction[J].Proceedings of the AAAI Conference on Artificial Intelligence,2023,37(4):4552-4560.
[22] REN H,ANICIC D,RUNKLER T A.TinyOL:TinyML withOnline-Learning on Microcontrollers[C]//2021 International Joint Conference on Neural Networks(IJCNN).2021:1-8.
[23] ZHANG M J,HE Y L,LI X,et al.Distributed Two-stage Clustering Method Based on Node Sampling[J].Computer Science,2025,52(2):134-144.
[24] WU Z M,CAO J J,TANG Q.Online Parallel SDN Routing Optimization Algorithm Based on Deep Reinforcement Learning [J].Computer Science,2025,52(S1):240900018-9.
[25] RÖCKER S.sroecker/creme[CP/OL].(2023-03-16)[2023-06-18] .https://github.com/sroecker/creme.
[26] VowpalWabbit[EB/OL].[2023-06-18] .https://vowpalwabbit.org/.
[27] BIFET A,HOLMES G,PFAHRINGER B,et al.MOA:Massive Online Analysis,a Framework for Stream Classification and Clustering[C]//Proceedings of the First Workshop on Applications of Pattern Analysis.PMLR,2010:44-50.
[28] ZAHARIA M,CHOWDHURY M,DAS T,et al.Resilient dis-tributed datasets:A Fault-Tolerant abstraction for In-Memory cluster computing[C]//9th USENIX Symposium on Networked Systems Design and Implementation(NSDI 12).2012:15-28.
[29] Spark Streaming-Spark 3.5.0 Documentation[EB/OL].[2023-12-12] .https://spark.apache.org/docs/latest/streaming-programming-guide.html.
[30] RisingWave:Open-Source Streaming Database[EB/OL].[2023-12-12] .https://risingwave.com/.
[1] SUN Xiaoxue, JIA Haipeng, ZHANG Yunquan, YU Yue, QIN Pinle. GPU-based Implementation and Optimization of Banded Matrix LU Factorization [J]. Computer Science, 2026, 53(6): 117-127.
[2] LI Fei, LIU Song, GUO Songjian, LIU Jiazheng, ZHANG Ying, HONG Longwei, ZHANG Boxuan. High-performance Image Preprocessing Operators for Cambricon MLU Accelerator Card [J]. Computer Science, 2026, 53(6): 193-202.
[3] XIE Zhenjie, LIU Yiming, CAI Ruijie, LUO Youqiang. Performance Optimization Method for Domestic Cryptographic Algorithm SM9 [J]. Computer Science, 2025, 52(6): 390-396.
[4] LI Qing, JIA Haipeng, ZHANG Yunquan, ZHANG Sijia. Input-aware Generalized Matrix-Vector Product Algorithm for Adaptative PerformanceOptimization of Hygon DCU [J]. Computer Science, 2025, 52(4): 291-300.
[5] MEN Ruirui, JIA Hongyong, DU Jinru. Study on Stream Data Authorization Revocation Scheme Based on Smart Contracts [J]. Computer Science, 2024, 51(10): 372-379.
[6] CHEN Jun-wu, YU Hua-shan. Strategies for Improving Δ-stepping Algorithm on Scale-free Graphs [J]. Computer Science, 2022, 49(6A): 594-600.
[7] CHEN Le, GAO Ling, REN Jie, DANG Xin, WANG Yi-hao, CAO Rui, ZHENG Jie, WANG Hai. Adaptive Bitrate Streaming for Energy-Efficiency Mobile Augmented Reality [J]. Computer Science, 2022, 49(1): 194-203.
[8] E Hai-hong, ZHANG Tian-yu, SONG Mei-na. Web-based Data Visualization Chart Rendering Optimization Method [J]. Computer Science, 2021, 48(3): 119-123.
[9] ZHANG Xiao, ZHANG Si-meng, SHI Jia, DONG Cong, LI Zhan-huai. Review on Performance Optimization of Ceph Distributed Storage System [J]. Computer Science, 2021, 48(2): 1-12.
[10] XU Jiang-feng and TAN Yu-long. Research on HBase Configuration Parameter Optimization Based on Machine Learning [J]. Computer Science, 2020, 47(6A): 474-479.
[11] ZHANG Peng-yi, SONG Jie. Research Advance on Efficiency Optimization of Blockchain Consensus Algorithms [J]. Computer Science, 2020, 47(12): 296-303.
[12] XU Chuan-fu,WANG Xi,LIU Shu,CHEN Shi-zhao,LIN Yu. Large-scale High-performance Lattice Boltzmann Multi-phase Flow Simulations Based on Python [J]. Computer Science, 2020, 47(1): 17-23.
[13] ZHANG Ling-hao, GUI Sheng-lin, MU Feng-jun, WANG Sheng. Clone Detection Algorithm for Binary Executable Code with Suffix Tree [J]. Computer Science, 2019, 46(10): 141-147.
[14] XU Qi-ze, HAN Wen-ting, CHEN Jun-shi, AN Hong. Optimization of Breadth-first Search Algorithm Based on Many-core Platform [J]. Computer Science, 2019, 46(1): 314-319.
[15] SUN Tao, ZHANG Jun-xing. Review of SDN Performance Optimization Technology [J]. Computer Science, 2018, 45(11A): 84-91.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!