计算机科学 ›› 2020, Vol. 47 ›› Issue (8): 119-126.doi: 10.11896/jsjkx.200300010
所属专题: 高性能计算
谢文康1, 樊卫北1, 2, 3, 张玉杰1, 2, 3, 徐鹤1, 2, 3, 李鹏1, 2, 3
XIE Wen-kang1, FAN Wei-bei1, 2, 3, ZHANG Yu-jie1, 2, 3, XU He1, 2, 3, LI Peng1, 2, 3,
摘要: Kafka应用在生产环境中时, 除机器的硬件环境和系统平台影响其性能外, Kafka自身的配置项决定着其能否在硬件资源有限的情况下达到理想的性能, 但人为修改和调优配置项的效率极差。海量数据发送到Kafka后, 如果不针对实际资源环境进行调优, Kafka使用默认的配置参数无法保证其在每个生产环境下的性能。因为Kafka自身的配置项非常大, 传统的自适应算法在大规模生产系统中的性能较差。为了提高Kafka的自适应能力, 消除系统中的复杂性, 获得更好的运行性能, 提出一种针对Kafka的自适应性能调优方法。该方法充分考虑了Kafka特征参数与性能的影响权值, 并使用抽样的原理来提高数据集的生成效率并优化数据选取范围, 提高建模的效率并降低优化方法的复杂度。实验结果显示, 该算法对开源版本Kafka的吞吐率和时延进行了优化, 提高了Kafka在给定的系统资源下的吞吐性能, 并降低了时延。
中图分类号:
[1]DEAN J, GHEMAWAT S.MapReduce:Simplified Data Pro-cessing on Large Clusters//Sixth Symposium on Operating System Design & Implementation.USENIX Association, 2004:107-117. [2] HIRAMAN B R, CHAPTE V M, ABHIJEET C K.A Study ofApache Kafka in Big Data Stream Processing//2018 International Conference on Information, Communication, Enginee-ring and Technology(ICICET).2018:1-20. [3] DOBBELAERE P, ESMAILI K S.Kafka versus RabbitMQ:A comparative study of two industry reference publish/subscribe implementations:Industry Paper//Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems(DEBS’17).Barcelona, Spain:ACM Press, 2017:227-238. [4] DELAMER I M, MARTINEZ LASTRA J L, PEREZ O.An evo-lutionary algorithm for optimization of XML publish/subscribe middleware in electronics production//Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006(ICRA 2006).2006:681-688. [5] BANG J, SON S, KIM H, et al.Design and implementation of a load shedding engine for solving starvation problems in Apache Kafka//NOMS 2018-2018 IEEE/IFIP Network Operations and Management Symposium.Taipei:IEEE, 2018:1-4. [6] JAVED M H, LU X, PANDA D K.Characterization of Big Data Stream Processing Pipeline:A Case Study using Flink and Kafka//Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies(BDCAT’17).Austin, Texas, USA:ACM Press, 2017:1-10. [7] D’SILVA G M, KHAN A, GAURAV, et al.Real-time proces-sing of IoT events with historic data using Apache Kafka and Apache Spark with dashing framework//2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology(RTEICT).Bangalore:IEEE, 2017:1804-1809. [8] HECHT R, JABLONSKI S.NoSQL evaluation:A use caseoriented survey//2011 International Conference on Cloud and Service Computing.2011:336-341. [9] KAUR J.In-Memory Data processing using Redis Database.International Journal of Computer Applications, 2018, 180(25):26-31. [10]HAO X, JIN P, YUE L.Efficient Storage of Multi-Sensor Object-Tracking Data.IEEE Transactions on Parallel and Distributed Systems, 2016, 27(10):2881-2894. [11]WANG Y, WANG C.A Design of Reliable Consumer Based on Kafka.Software, 2016, 37(1):61-66. [12]ZOU H, HASTIE T.Regularization and variable selection viathe elastic net.Journal of the Royal Statistical Society:Series B(Statistical Methodology), 2005, 67(2):301-320. [13]BAO L, LIU X, XU Z, et al.AutoConfig:automatic configuration tuning for distributed message systems//Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering(ASE 2018).Montpellier, France:ACM Press, 2018:29-40. [14]ALEJAN DRO A P.Workload Characterization of the SPEC-jms-2007 Benchmark//Formal Methods & Stochastic Models for Performance Evaluation, Fourth European Performance Engineering Workshop.Epew, Berlin, Germany, 2007:228-244. [15]BUCHMANN A.Benchmarking of message-oriented middle-ware//Proc of the Debs.2009:1-2. [16]ESPOSITO C, RUSSO S, CRESCENZO D D.Performance assessment of OMG compliant data distribution middleware//IEEE International Parallel & Distributed Processing Symposium.2008:1-8. [17]HENARD C, PAPADAKIS M, HARMAN M, et al.Combining Multi-Objective Search and Constraint Solvin//2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.2015:517-528. [18]SHANGGUAN B, YUE P, WU Z.A stream computing based apporach for updating waterlogging infomation on remotesen-sing images//2017 IEEE International Geoscience and Remote Sensing Symposium(IGARSS).Fort Worth, TX:IEEE, 2017:373-375. [19]HAN J, MICHELINE K.Data mining:concepts and techniques.Data Mining Concepts Models Methods & Algorithms Se-cond Edition, 2006, 5(4):1-18. [20]MAGERMAN D M.Statistical decision-tree models for parsing//Proceedings of the 33rd annual meeting on Association for Computational Linguistics.Cambridge, Massachusetts:Association for Computational Linguistics, 1995:276-283. [21]BEI Z, YU Z, ZHANG H, et al.RFHOC:A Random-Forest Approach to Auto-Tuning Hadoop’s Configuration.IEEE Transactions on Parallel and Distributed Systems, 2016, 27(5):1470-1483. [22]TIBSHIRANI R.Regression Shrinkage and Selection Via theLasso.Journal of the Royal Statistical Society, 1996, 58(1):267-288. [23]HELTON J C, DAVIS F J.Latin hypercube sampling and thepropagation of uncertainty in analyses of complex systems.Reliability Engineering & System Safety, 2003, 81(1):23-69. |
[1] | 高子妍, 王勇. 面向云服务的分布式消息系统负载均衡策略 Load Balancing Strategy of Distributed Messaging System for Cloud Services 计算机科学, 2020, 47(6A): 318-324. https://doi.org/10.11896/JsJkx.191100012 |
[2] | 王绪亮, 聂铁铮, 唐欣然, 黄菊, 李迪, 闫铭森, 刘畅. 流式数据处理的动态自适应缓存策略研究 Study on Dynamic Adaptive Caching Strategy for Streaming Data Processing 计算机科学, 2020, 47(11): 122-127. https://doi.org/10.11896/jsjkx.190800093 |
[3] | 贾玉福, 李明磊, 刘文平, 胡胜红, 蒋洪波. 一种基于WiFi相异度的群组感知分析方法 Group Perception Analysis Method Based on WiFi Dissimilarity 计算机科学, 2020, 47(10): 63-68. https://doi.org/10.11896/jsjkx.200600014 |
[4] | 吴璨, 王小宁, 肖海力, 曹荣强, 赵一宁, 迟学斌. 分布式消息系统研究综述 Survey on Distributed Message System 计算机科学, 2019, 46(6A): 1-5. |
[5] | 徐新黎,陈琛,皇甫晓洁,崔永婷. 能量受限的单移动设备无线充电调度算法 Wireless Charging Scheduling Algorithm of Single Mobile Vehicle with Limited Energy 计算机科学, 2018, 45(3): 108-114. https://doi.org/10.11896/j.issn.1002-137X.2018.03.018 |
[6] | 易佳,薛晨,王树鹏. 分布式流数据加载和查询技术优化 Optimization on Distributed Stream Data Loading and Querying 计算机科学, 2017, 44(5): 172-177. https://doi.org/10.11896/j.issn.1002-137X.2017.05.031 |
[7] | 马学彬,张岩纹,欧阳真超,王丽婷. 基于机会网络的消息传输系统 Message Transmission System for Opportunistic Networks 计算机科学, 2013, 40(Z11): 41-45. |
[8] | 唐蓉君,叶波,文俊浩. 面向服务环境中的NServiceBus服务总线应用研究 Research and Application of NServiceBus Service Bus in SOA Environment 计算机科学, 2013, 40(7): 157-161. |
[9] | 苟彦,李华. 存储区域网络在联机事务处理下的性能研究 Performance Research of Storage Area Network under the Online Transaction Processing 计算机科学, 2011, 38(Z10): 172-174. |
[10] | 陆承涛,冯丹,王芳,葛雄资. 一种基于统计分析的存储系统性能调优方法 Statistical Analysis-based Approach for Storage System Performance Tuning 计算机科学, 2010, 37(11): 289-293. |
[11] | 李益民 邢春晓 严琪 胡庆成 张小虎. 负载感知的存储子系统调优研究 计算机科学, 2007, 34(11): 293-297. |
[12] | 周世杰 刘锦德 秦志光. 消息队列技术研究:综述与一个实例 计算机科学, 2002, 29(2): 84-86. |
|