Computer Science ›› 2020, Vol. 47 ›› Issue (8): 119-126.doi: 10.11896/jsjkx.200300010

;

Previous Articles     Next Articles

ENLHS:Sampling Approach to Auto Tuning Kafka Configurations

XIE Wen-kang1, FAN Wei-bei1, 2, 3, ZHANG Yu-jie1, 2, 3, XU He1, 2, 3, LI Peng1, 2, 3,   

  1. 1 School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
    2 Nanjing Center of HPC China, Nanjing 210023, China
    3 Jiangsu HPC and Intelligent Processing Engineer Research Center, Nanjing 210023, China
  • Online:2020-08-15 Published:2020-08-10
  • About author:XIE Wen-kang, born in 1995, postgra-duate.His main research interests include cloud computing, big data frameworks and data processing.
    LI Peng, born in 1979, Ph.D, professor, master supervisor, is a member of China Computer Federation.His main research interests include computer communication networks, cloud computing, and information security.
  • Supported by:
    This work was supported by the National Key R&D Program of China(2018YFB1003201), National Natural Science Foundation of China(61672296, 61602261, 61872196, 61872194), Scientific and Technological Support Project of Jiangsu Province(BE2017166, BE2019740), Major Natural Science Research Projects in Colleges and Universities of Jiangsu Province(18KJA520008) and Six Talent Peaks Project of Jiangsu Pro-vince(RJFW-111).

Abstract: When Kafka is applied in a production environment, its performance is not only limited by the machine’s hardware environment and system platform.Its own configuration items are the key element to judge whether it can achieve the desired performance under the condition of limited hardware resources, but it is manually configured.The efficiency of item modification and tuning is extremely poor.If the actual resource environment is not optimized, Kafka cannot guarantee its performance in each production environment using default configuration parameters.Because Kafka’s configuration bound is extremely large, the perfor-mance of traditional adaptive algorithms in large-scale production systems is poor.Therefore, in order to improve Kafka’s adaptive ability, eliminate complexity in the system, and obtain better operating performance, an adaptive performance tuning method for Kafka is proposed, which fully considers the influence weights of Kafka’s characteristic parameters and performance.It uses the principle of sampling to improve the efficiency of data sets generation and optimize the range of data selection, improve the efficiency of modeling and reduce the complexity of optimization methods.Experiments show that the algorithm optimizes the throughput rate and latency of the open source version Kafka, improves Kafka’s throughputs under a given system resource, and reduces latency.

Key words: Elastic net, Kafka, Latin hypercube sampling, Message queue, Performance tuning

CLC Number: 

  • TP311.5
[1]DEAN J, GHEMAWAT S.MapReduce:Simplified Data Pro-cessing on Large Clusters//Sixth Symposium on Operating System Design & Implementation.USENIX Association, 2004:107-117.
[2] HIRAMAN B R, CHAPTE V M, ABHIJEET C K.A Study ofApache Kafka in Big Data Stream Processing//2018 International Conference on Information, Communication, Enginee-ring and Technology(ICICET).2018:1-20.
[3] DOBBELAERE P, ESMAILI K S.Kafka versus RabbitMQ:A comparative study of two industry reference publish/subscribe implementations:Industry Paper//Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems(DEBS’17).Barcelona, Spain:ACM Press, 2017:227-238.
[4] DELAMER I M, MARTINEZ LASTRA J L, PEREZ O.An evo-lutionary algorithm for optimization of XML publish/subscribe middleware in electronics production//Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006(ICRA 2006).2006:681-688.
[5] BANG J, SON S, KIM H, et al.Design and implementation of a load shedding engine for solving starvation problems in Apache Kafka//NOMS 2018-2018 IEEE/IFIP Network Operations and Management Symposium.Taipei:IEEE, 2018:1-4.
[6] JAVED M H, LU X, PANDA D K.Characterization of Big Data Stream Processing Pipeline:A Case Study using Flink and Kafka//Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies(BDCAT’17).Austin, Texas, USA:ACM Press, 2017:1-10.
[7] D’SILVA G M, KHAN A, GAURAV, et al.Real-time proces-sing of IoT events with historic data using Apache Kafka and Apache Spark with dashing framework//2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology(RTEICT).Bangalore:IEEE, 2017:1804-1809.
[8] HECHT R, JABLONSKI S.NoSQL evaluation:A use caseoriented survey//2011 International Conference on Cloud and Service Computing.2011:336-341.
[9] KAUR J.In-Memory Data processing using Redis Database.International Journal of Computer Applications, 2018, 180(25):26-31.
[10]HAO X, JIN P, YUE L.Efficient Storage of Multi-Sensor Object-Tracking Data.IEEE Transactions on Parallel and Distributed Systems, 2016, 27(10):2881-2894.
[11]WANG Y, WANG C.A Design of Reliable Consumer Based on Kafka.Software, 2016, 37(1):61-66.
[12]ZOU H, HASTIE T.Regularization and variable selection viathe elastic net.Journal of the Royal Statistical Society:Series B(Statistical Methodology), 2005, 67(2):301-320.
[13]BAO L, LIU X, XU Z, et al.AutoConfig:automatic configuration tuning for distributed message systems//Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering(ASE 2018).Montpellier, France:ACM Press, 2018:29-40.
[14]ALEJAN DRO A P.Workload Characterization of the SPEC-jms-2007 Benchmark//Formal Methods & Stochastic Models for Performance Evaluation, Fourth European Performance Engineering Workshop.Epew, Berlin, Germany, 2007:228-244.
[15]BUCHMANN A.Benchmarking of message-oriented middle-ware//Proc of the Debs.2009:1-2.
[16]ESPOSITO C, RUSSO S, CRESCENZO D D.Performance assessment of OMG compliant data distribution middleware//IEEE International Parallel & Distributed Processing Symposium.2008:1-8.
[17]HENARD C, PAPADAKIS M, HARMAN M, et al.Combining Multi-Objective Search and Constraint Solvin//2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.2015:517-528.
[18]SHANGGUAN B, YUE P, WU Z.A stream computing based apporach for updating waterlogging infomation on remotesen-sing images//2017 IEEE International Geoscience and Remote Sensing Symposium(IGARSS).Fort Worth, TX:IEEE, 2017:373-375.
[19]HAN J, MICHELINE K.Data mining:concepts and techniques.Data Mining Concepts Models Methods & Algorithms Se-cond Edition, 2006, 5(4):1-18.
[20]MAGERMAN D M.Statistical decision-tree models for parsing//Proceedings of the 33rd annual meeting on Association for Computational Linguistics.Cambridge, Massachusetts:Association for Computational Linguistics, 1995:276-283.
[21]BEI Z, YU Z, ZHANG H, et al.RFHOC:A Random-Forest Approach to Auto-Tuning Hadoop’s Configuration.IEEE Transactions on Parallel and Distributed Systems, 2016, 27(5):1470-1483.
[22]TIBSHIRANI R.Regression Shrinkage and Selection Via theLasso.Journal of the Royal Statistical Society, 1996, 58(1):267-288.
[23]HELTON J C, DAVIS F J.Latin hypercube sampling and thepropagation of uncertainty in analyses of complex systems.Reliability Engineering & System Safety, 2003, 81(1):23-69.
[1] GAO Zi-yan and WANG Yong. Load Balancing Strategy of Distributed Messaging System for Cloud Services [J]. Computer Science, 2020, 47(6A): 318-324.
[2] WANG Xu-liang, NIE Tie-zheng, TANG Xin-ran, HUANG Ju, LI Di, YAN Ming-sen, LIU Chang. Study on Dynamic Adaptive Caching Strategy for Streaming Data Processing [J]. Computer Science, 2020, 47(11): 122-127.
[3] JIA Yu-fu, LI Ming-lei, LIU Wen-ping, HU Sheng-hong, JIANG Hong-bo. Group Perception Analysis Method Based on WiFi Dissimilarity [J]. Computer Science, 2020, 47(10): 63-68.
[4] WU Can, WANG Xiao-ning, XIAO Hai-li, CAO Rong-qiang, ZHAO Yi-ning, CHI Xue-bin. Survey on Distributed Message System [J]. Computer Science, 2019, 46(6A): 1-5.
[5] XU Xin-li, CHEN Chen, HUANGFU Xiao-jie and CUI Yong-ting. Wireless Charging Scheduling Algorithm of Single Mobile Vehicle with Limited Energy [J]. Computer Science, 2018, 45(3): 108-114.
[6] YI Jia, XUE Chen and WANG Shu-peng. Optimization on Distributed Stream Data Loading and Querying [J]. Computer Science, 2017, 44(5): 172-177.
[7] TANG Rong-jun,YE Bo and WEN Jun-hao. Research and Application of NServiceBus Service Bus in SOA Environment [J]. Computer Science, 2013, 40(7): 157-161.
[8] . Uniprocessor Performance Tuning of a Structured Grid Based Parallel CFD Application [J]. Computer Science, 2013, 40(3): 116-120.
[9] WANG Jie,DAI Qing-hao,LI Huan. Tuning of Parallel Frequent Pattern Growth Algorithm Based on Distributed Coordination System [J]. Computer Science, 2012, 39(3): 174-182.
[10] LU Cheng-tao,FENG Dan,WANG Fang,GE Xiong-zi. Statistical Analysis-based Approach for Storage System Performance Tuning [J]. Computer Science, 2010, 37(11): 289-293.
[11] LI Yi-Min, XING Chun-Xiao, YAN Qi ,HU Qing-Cheng ,ZHANG Xiao-Hu (Dept, of Computer Science and Technology, Tsinghua University, Beijing 100084). [J]. Computer Science, 2007, 34(11): 293-297.
[12] . [J]. Computer Science, 2006, 33(8): 275-277.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!