计算机科学 ›› 2020, Vol. 47 ›› Issue (11): 122-127.doi: 10.11896/jsjkx.190800093
王绪亮1, 聂铁铮1, 唐欣然2, 黄菊1, 李迪1, 闫铭森1, 刘畅1
WANG Xu-liang1, NIE Tie-zheng1, TANG Xin-ran2, HUANG Ju1, LI Di1, YAN Ming-sen1, LIU Chang1
摘要: 在现代大数据处理应用场景中,流数据处理技术的应用十分广泛。消息中间件或消息队列常在流数据处理中起到数据缓冲的作用。Apache Kafka常被用作数据缓冲中间件,Kafka的工作性能在很大程度上决定着应用系统整体的性能。在实际应用中,Kafka的上游数据源所产生的数据流量通常是不稳定的,静态的缓存策略不能适应这种多变的生产环境。针对这一问题,如果存在一种策略能根据上游流量变化动态调整数据缓存,就能增强系统对环境的适应能力,实现流数据缓存处理的实时性和吞吐量性能的提升。动态缓存策略采用对上游数据流量监控的方法,通过使用ARIMA模型对未来流量进行预测,提前调整流数据存储转发设置。流数据缓存设置参数的最佳值来源于在各压力下对中间件系统性能进行实验得到的结果的多目标优化。对比实验结果证明,在流数据高峰到达期间,策略在保证一定最大延迟的前提下可以使Apache Kafka的数据缓冲吞吐量性能提高150%以上,从而提高了系统的整体性能。
中图分类号:
[1] LIU Y,WANG F,YANG M C.“Fast” and “Flexible” Big Data-Flexible Storage Technology in the Big Data Era [C]//2015 Annual Meeting of the Information and Communication Network Technology Committee of the Chinese Communication Society.Beijing,China:Information and Communication Network Technology Committee of the Chinese Communication Society,2015. [2] YANG C,WENG Z J,MENG X F,et al.Astronomical Big Data Challenge and Real-time Processing Technology [J].Computer Research and Development,2017,54(2):248-257. [3] LI L H,LI H Y,ZHANG F.Design and Implementation of Message Middleware[J].Computer Engineering,2000,26(1):46-48. [4] WANG G,KOSHY J,SUBRAMANIAN S,et al.Building aReplicated logging system with Apache Kafka[J].Proceedings of the VLDB Endowment,2015,8(12):1654-1655. [5] ICHINOSE A,TAKEFUSA A,NAKADA H,et al.A study of a video analysis framework using Kafka and spark streaming[C] //2017 IEEE International Conference on Big Data.Boston,MA,USA:IEEE,2017. [6] MICHAEL D G,AZHARUDDIN V,GAURAV J,et al.Real-time Processing of IoT Events with Historic data using Apache Kafka and Apache Spark with Dashing framework[C] //20172nd IEEE International Conference on Recent Trends in Electronics,Information and Communication Technology (RTEICT).Sri Venkateshwara Coll Engn,Bangalore,INDIA:IEEE,2017. [7] ZHOU S J,LIU J D,QIN Z G.Research on Message Queuing Technology:Review and an Example [J].Computer Science,2002,29(2):84-86. [8] LI L N,WEI X H,LI X,et al.Elastic resource allocation for load burst sensing in stream data processing [J].Journal of Compu-ter Science,2018,41(10):2193-2208. [9] CUI X C,YU X H,LIU Y,et al.Overview of distributed flowprocessing technology [J].Computer Research and Development,2015,52(2):318-332. [10] KLEPPMANN M,KERPS J.KAFKA,Samzaand the Unix Philosophy of Distributed Data [EB/OL].[2019-08-01].http://sites.computer.org/debull/A15dec/p4.pdf. [11] WANG Z Y.Research and Implementation of Performance Modeling and Optimization Technology for Distributed Message System Kafka [D] :Xi'an:Xi'an University of Electronic Science and Technology,2017. [12] KREPS J,NARKHEDE N,RAO N.Kafka:a Distributed Messaging System for Log Processing[EB/OL].(2011-06-20)[2019-08-01].https://www.microsoft.com/en-us/research/wp-content/uploads/2017/09/Kafka.pdf. [13] ZHENG B Q,ZOU H X,HU X J.Research on Network Public Opinion Prediction Based on Turning Point [J].Computer Science,2018,45(S2):539-541. [14] ZITZLER E,DEB K,THIELE I.Comparison of multiobjective evolutionary algorithms;Empirical results[J].Evolutionary Computation,2000,8(2):173-195. [15] SRINIVAS N,DEB K.Multiobjective optimization using non-dominated sorting in genetic algorithms[J].Evolutionary Computation,1994,2(3):221-248. [16] TIAN Y,CHENG R,ZHANG X,et al.PlatEMO:A MATLAB Platform for Evolutionary Multi-Objective Optimization[J].IEEE Computational Intelligence Magazine,2017,12(4):73-87. |
[1] | 蔡欣雨, 冯翔, 虞慧群. 自适应权重的级联增强节点的宽度学习算法 Adaptive Weight Based Broad Learning Algorithm for Cascaded Enhanced Nodes 计算机科学, 2022, 49(6): 134-141. https://doi.org/10.11896/jsjkx.210500119 |
[2] | 孙刚, 伍江江, 陈浩, 李军, 徐仕远. 一种基于切比雪夫距离的隐式偏好多目标进化算法 Hidden Preference-based Multi-objective Evolutionary Algorithm Based on Chebyshev Distance 计算机科学, 2022, 49(6): 297-304. https://doi.org/10.11896/jsjkx.210500095 |
[3] | 李浩东, 胡洁, 范勤勤. 基于并行分区搜索的多模态多目标优化及其应用 Multimodal Multi-objective Optimization Based on Parallel Zoning Search and Its Application 计算机科学, 2022, 49(5): 212-220. https://doi.org/10.11896/jsjkx.210300019 |
[4] | 彭冬阳, 王睿, 胡谷雨, 祖家琛, 王田丰. 视频缓存策略中QoE和能量效率的公平联合优化 Fair Joint Optimization of QoE and Energy Efficiency in Caching Strategy for Videos 计算机科学, 2022, 49(4): 312-320. https://doi.org/10.11896/jsjkx.210800027 |
[5] | 王珂, 曲桦, 赵季红. 多域SFC部署中基于强化学习的多目标优化方法 Multi-objective Optimization Method Based on Reinforcement Learning in Multi-domain SFC Deployment 计算机科学, 2021, 48(12): 324-330. https://doi.org/10.11896/jsjkx.201100159 |
[6] | 崔国楠, 王立松, 康介祥, 高忠杰, 王辉, 尹伟. 结合多目标优化算法的模糊聚类有效性指标及应用 Fuzzy Clustering Validity Index Combined with Multi-objective Optimization Algorithm and Its Application 计算机科学, 2021, 48(10): 197-203. https://doi.org/10.11896/jsjkx.200900061 |
[7] | 朱汉卿, 马武彬, 周浩浩, 吴亚辉, 黄宏斌. 基于改进多目标进化算法的微服务用户请求分配策略 Microservices User Requests Allocation Strategy Based on Improved Multi-objective Evolutionary Algorithms 计算机科学, 2021, 48(10): 343-350. https://doi.org/10.11896/jsjkx.201100009 |
[8] | 张清琪, 刘漫丹. 复杂网络社区发现的多目标五行环优化算法 Multi-objective Five-elements Cycle Optimization Algorithm for Complex Network Community Discovery 计算机科学, 2020, 47(8): 284-290. https://doi.org/10.11896/jsjkx.190700082 |
[9] | 郑友莲, 雷德明, 郑巧仙. 求解高维多目标调度的新型人工蜂群算法 Novel Artificial Bee Colony Algorithm for Solving Many-objective Scheduling 计算机科学, 2020, 47(7): 186-191. https://doi.org/10.11896/jsjkx.190600089 |
[10] | 孙敏, 陈中雄, 叶侨楠. 云环境下基于HEDSM的工作流调度策略 Workflow Scheduling Strategy Based on HEDSM Under Cloud Environment 计算机科学, 2020, 47(6): 252-259. https://doi.org/10.11896/jsjkx.190400047 |
[11] | 赵松辉, 任志磊, 江贺. 软件升级问题的多目标优化方法 Multi-objective Optimization Methods for Software Upgradeability Problem 计算机科学, 2020, 47(6): 16-23. https://doi.org/10.11896/jsjkx.200400027 |
[12] | 夏春艳, 王兴亚, 张岩. 基于多目标优化的测试用例优先级排序方法 Test Case Prioritization Based on Multi-objective Optimization 计算机科学, 2020, 47(6): 38-43. https://doi.org/10.11896/jsjkx.191100113 |
[13] | 董明刚,刘宝,敬超. 模糊自适应排序变异多目标差分进化算法 Multi-objective Differential Evolution Algorithm with Fuzzy Adaptive Ranking-based Mutation 计算机科学, 2019, 46(7): 224-232. https://doi.org/10.11896/j.issn.1002-137X.2019.07.034 |
[14] | 汪晨欣, 杨家海, 庄奕, 罗念龙. 未来网络试验设施的节点资源调度算法 Node Resource Scheduling for Future Network Experimentation Facility 计算机科学, 2019, 46(12): 95-100. https://doi.org/10.11896/jsjkx.190400106 |
[15] | 赵云涛, 谌竟成, 李维刚. 融合自适应差分进化机制的多目标灰狼优化算法 Multi-objective Grey Wolf Optimization Hybrid Adaptive Differential Evolution Mechanism 计算机科学, 2019, 46(11A): 83-88. |
|