Computer Science ›› 2020, Vol. 47 ›› Issue (11): 122-127.doi: 10.11896/jsjkx.190800093

• Database & Big Data & Data Science • Previous Articles     Next Articles

Study on Dynamic Adaptive Caching Strategy for Streaming Data Processing

WANG Xu-liang1, NIE Tie-zheng1, TANG Xin-ran2, HUANG Ju1, LI Di1, YAN Ming-sen1, LIU Chang1   

  1. 1 School of Computer Science and Engineering,Northeastern University,Shenyang 110169,China
    2 College of Software,Northeastern University,Shenyang 110169,China
  • Received:2019-08-16 Revised:2019-12-17 Online:2020-11-15 Published:2020-11-05
  • About author:WANG Xu-liang,born in 1998,bachelor.His main research interests include data mining and database.
    NIE Tie-zheng,born in 1980,Ph.D,associate professor,master supervisor,is a member of China Computer Federation.His research interests include database,data integration and blockchain.

Abstract: In current scenarios of the big data processing application,the streaming data processing technique is widely used.Message middleware or message queue is usually applied as the data buffer in streaming data processing.Apache Kafka is often used as the data buffer middleware.The performance of Kafka largely determines the overall performance of the application system.In practical applications,the streaming data generated by upstream data sources is usually unstable,and the static data caching strategy cannot adapt to this variable production environment.In view of this problem,if there is a strategy that can dynamically adjust the data cache according to the upstream traffic changes,the adaptability of the system to environment can be enhanced,the real-time processing of streaming data caching can be realized and the throughput performance can also be improved.In the dynamic caching strategy,a method of monitoring the upstream data traffic is proposed,and the ARIMA model is used to predict the future traffic of data streaming,so as to adjust the settings of streaming data storage in advance.The optimum setting parameter of streaming data cache comes from multi-objective optimization of the experimental results of middleware system performance under various pressures.Comparative experimental results show that,during the peak period of streaming data,the strategy can improve the throughput performance of Apache Kafka by more than 150% while guaranteeing a certain maximum delay,thus the overall performance of the message middleware system can be improved.

Key words: Apache Kafka, Message middleware, Multi-objective optimization, Streaming data processing, Time series forecast

CLC Number: 

  • TP311
[1] LIU Y,WANG F,YANG M C.“Fast” and “Flexible” Big Data-Flexible Storage Technology in the Big Data Era [C]//2015 Annual Meeting of the Information and Communication Network Technology Committee of the Chinese Communication Society.Beijing,China:Information and Communication Network Technology Committee of the Chinese Communication Society,2015.
[2] YANG C,WENG Z J,MENG X F,et al.Astronomical Big Data Challenge and Real-time Processing Technology [J].Computer Research and Development,2017,54(2):248-257.
[3] LI L H,LI H Y,ZHANG F.Design and Implementation of Message Middleware[J].Computer Engineering,2000,26(1):46-48.
[4] WANG G,KOSHY J,SUBRAMANIAN S,et al.Building aReplicated logging system with Apache Kafka[J].Proceedings of the VLDB Endowment,2015,8(12):1654-1655.
[5] ICHINOSE A,TAKEFUSA A,NAKADA H,et al.A study of a video analysis framework using Kafka and spark streaming[C] //2017 IEEE International Conference on Big Data.Boston,MA,USA:IEEE,2017.
[6] MICHAEL D G,AZHARUDDIN V,GAURAV J,et al.Real-time Processing of IoT Events with Historic data using Apache Kafka and Apache Spark with Dashing framework[C] //20172nd IEEE International Conference on Recent Trends in Electronics,Information and Communication Technology (RTEICT).Sri Venkateshwara Coll Engn,Bangalore,INDIA:IEEE,2017.
[7] ZHOU S J,LIU J D,QIN Z G.Research on Message Queuing Technology:Review and an Example [J].Computer Science,2002,29(2):84-86.
[8] LI L N,WEI X H,LI X,et al.Elastic resource allocation for load burst sensing in stream data processing [J].Journal of Compu-ter Science,2018,41(10):2193-2208.
[9] CUI X C,YU X H,LIU Y,et al.Overview of distributed flowprocessing technology [J].Computer Research and Development,2015,52(2):318-332.
[10] KLEPPMANN M,KERPS J.KAFKA,Samzaand the Unix Philosophy of Distributed Data [EB/OL].[2019-08-01].http://sites.computer.org/debull/A15dec/p4.pdf.
[11] WANG Z Y.Research and Implementation of Performance Modeling and Optimization Technology for Distributed Message System Kafka [D] :Xi'an:Xi'an University of Electronic Science and Technology,2017.
[12] KREPS J,NARKHEDE N,RAO N.Kafka:a Distributed Messaging System for Log Processing[EB/OL].(2011-06-20)[2019-08-01].https://www.microsoft.com/en-us/research/wp-content/uploads/2017/09/Kafka.pdf.
[13] ZHENG B Q,ZOU H X,HU X J.Research on Network Public Opinion Prediction Based on Turning Point [J].Computer Science,2018,45(S2):539-541.
[14] ZITZLER E,DEB K,THIELE I.Comparison of multiobjective evolutionary algorithms;Empirical results[J].Evolutionary Computation,2000,8(2):173-195.
[15] SRINIVAS N,DEB K.Multiobjective optimization using non-dominated sorting in genetic algorithms[J].Evolutionary Computation,1994,2(3):221-248.
[16] TIAN Y,CHENG R,ZHANG X,et al.PlatEMO:A MATLAB Platform for Evolutionary Multi-Objective Optimization[J].IEEE Computational Intelligence Magazine,2017,12(4):73-87.
[1] SUN Gang, WU Jiang-jiang, CHEN Hao, LI Jun, XU Shi-yuan. Hidden Preference-based Multi-objective Evolutionary Algorithm Based on Chebyshev Distance [J]. Computer Science, 2022, 49(6): 297-304.
[2] LI Hao-dong, HU Jie, FAN Qin-qin. Multimodal Multi-objective Optimization Based on Parallel Zoning Search and Its Application [J]. Computer Science, 2022, 49(5): 212-220.
[3] PENG Dong-yang, WANG Rui, HU Gu-yu, ZU Jia-chen, WANG Tian-feng. Fair Joint Optimization of QoE and Energy Efficiency in Caching Strategy for Videos [J]. Computer Science, 2022, 49(4): 312-320.
[4] GUAN Zheng, DENG Yang-lin, NIE Ren-can. Non-negative Matrix Factorization Based on Spectral Reconstruction Constraint for Hyperspectral and Panchromatic Image Fusion [J]. Computer Science, 2021, 48(9): 153-159.
[5] WANG Ke, QU Hua, ZHAO Ji-hong. Multi-objective Optimization Method Based on Reinforcement Learning in Multi-domain SFC Deployment [J]. Computer Science, 2021, 48(12): 324-330.
[6] ZHU Han-qing, MA Wu-bin, ZHOU Hao-hao, WU Ya-hui, HUANG Hong-bin. Microservices User Requests Allocation Strategy Based on Improved Multi-objective Evolutionary Algorithms [J]. Computer Science, 2021, 48(10): 343-350.
[7] CUI Guo-nan, WANG Li-song, KANG Jie-xiang, GAO Zhong-jie, WANG Hui, YIN Wei. Fuzzy Clustering Validity Index Combined with Multi-objective Optimization Algorithm and Its Application [J]. Computer Science, 2021, 48(10): 197-203.
[8] ZHANG Qing-qi, LIU Man-dan. Multi-objective Five-elements Cycle Optimization Algorithm for Complex Network Community Discovery [J]. Computer Science, 2020, 47(8): 284-290.
[9] ZHENG You-lian, LEI De-ming, ZHENG Qiao-xian. Novel Artificial Bee Colony Algorithm for Solving Many-objective Scheduling [J]. Computer Science, 2020, 47(7): 186-191.
[10] GAO Zi-yan and WANG Yong. Load Balancing Strategy of Distributed Messaging System for Cloud Services [J]. Computer Science, 2020, 47(6A): 318-324.
[11] ZHAO Song-hui, REN Zhi-lei, JIANG He. Multi-objective Optimization Methods for Software Upgradeability Problem [J]. Computer Science, 2020, 47(6): 16-23.
[12] XIA Chun-yan, WANG Xing-ya, ZHANG Yan. Test Case Prioritization Based on Multi-objective Optimization [J]. Computer Science, 2020, 47(6): 38-43.
[13] SUN Min, CHEN Zhong-xiong, YE Qiao-nan. Workflow Scheduling Strategy Based on HEDSM Under Cloud Environment [J]. Computer Science, 2020, 47(6): 252-259.
[14] DONG Ming-gang,LIU Bao,JING Chao. Multi-objective Differential Evolution Algorithm with Fuzzy Adaptive Ranking-based Mutation [J]. Computer Science, 2019, 46(7): 224-232.
[15] WANG Chen-xin, YANG Jia-hai, ZHUANG Yi, LUO Nian-long. Node Resource Scheduling for Future Network Experimentation Facility [J]. Computer Science, 2019, 46(12): 95-100.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!