Computer Science ›› 2020, Vol. 47 ›› Issue (6A): 474-479.doi: 10.11896/JsJkx.190900046

• Database & Big Data & Data Science • Previous Articles     Next Articles

Research on HBase Configuration Parameter Optimization Based on Machine Learning

XU Jiang-feng and TAN Yu-long   

  1. School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China
  • Published:2020-07-07
  • About author:XU Jiang-feng, born in 1965, Ph.D, professor, is a member of China Computer Federation.His research interests include data encryption technology, and network security technology.
    TAN Yu-long, born in 1994, postgradua-te, is a member of China Computer Federation.His research interests include information security, network security technology, and machine lear-ning.
  • Supported by:
    This work was supported by the Fundamental Research Funds for the Central University (20190605).

Abstract: HBase is a distributed database management system.For applications that require fast random access to large amounts of data,it is becoming increasingly popular.However,it has many performance-critical configuration parameters that can interact with each other in complex ways,making it extremely difficult to adJust them manually for optimal performance.In this paper,a new method is proposed to automatically tuning the configuration parameters of a given HBase application,called auto-tuning HBase.The key is to build a low-cost performance model with configuration parameters as input.Therefore,different modeling techniques are systematically studied,and the integrated learning algorithm is used to construct the performance model.Then the genetic algorithm is used to search for the optimal configuration parameters for the application through the performance model.As a result,it can quickly and automatically identify a set of configuration parameter values to maximize application performance.By testing the 5 applications with Yahoo! cloud service benchmark,experimental results show that,compared with the default configuration,the optimized throughput increases by 41% on average and can be up to 97%.At the same time,delays in HBase operations decrease by an average of 11.3% to as high as 57%.

Key words: Auto tuning, HBase, Machine learning, Performance modeling, Performance optimization

CLC Number: 

  • TP391
[1] COOPER B F,SILBERSTEIN A,TAM E,et al.Benchmarking cloud serving systems with YCSB//Proc.1st ACM Symp.Cloud Comput.(SoCC),New York,NY,USA,2010:143-154.
[2] HBase at Taobao,accessed on May 26,2017..http://www.eygle.com/digest/2012/03/hbase-at-taobao.html.
[3] Apache HBase Team.Apache HBase Reference Guide.http://hbase.apache.org/book.html.
[4] BAO X,LIU L,XIAO N,et al.Policy-driven configuration ma-nagement for NoSQL//Proc.IEEE 8th Int.Conf.Cloud Comput..2015:245-252.
[5] BREIMAN L.Bagging predictors.Mach.Learn.,1996,24(2):123-140.
[6] 赵宏,张洁,侯鲁健,等.并行GA_ANN预测模型研究.计算机工程与应用,2011(22).
[7] COOPER B F,SILBERSTEIN A,TAM E,et al.Benchmarkingcloud serving systems with YCSB//Proc.1st ACM Symp.Cloud Comput.(SoCC),New York,NY,USA,2010:143-154.
[8] BRODER A,MITZENMACHER M.Network applications of bloom fifilters:A survey.Internet Math.,2004,1(4):485-509.
[9] BREIMAN L.Random forests.Mach.Learn.,2001,45(1):5-32.
[10] EFRON B,TIBSHIRANI R J.An Introduction to Bootstrap .Boca Raton,FL,USA:CRC Press,1994.
[11] LIAW A,WIENER M.lassifification and regression by randomforest.R News,2002,2(3):18-22.
[12] COOPER B F,et al.PNUTS:Yahoo!’s hosted data serving platform.J.Proc.VLDB Endowment,2008,1(2):1277-1288.
[13] Apache Cassandra,accessed on May 26.http://incubator.apache.org/cassandra/.
[14] CALDER B,et al.Windows azure storage:A highly available cloud storage service with strong consistency//Proc.23rd ACM Symp.Oper.Syst.Principles,2011:143-157.
[15] Apache CouchDB,accessed on May 26,2017..http://couchdb.apache.org/.
[16] SCIORE E.SimpleDB:A simple Java-based multiuser syst forteaching database internals.ACM SIGCSE Bull.,2007,9(1):561-565.
[17] ProJect Voldemort,accessed on May 26.http://proJect-voldemort.com.
[1] CHEN Jun-wu, YU Hua-shan. Strategies for Improving Δ-stepping Algorithm on Scale-free Graphs [J]. Computer Science, 2022, 49(6A): 594-600.
[2] CHEN Le, GAO Ling, REN Jie, DANG Xin, WANG Yi-hao, CAO Rui, ZHENG Jie, WANG Hai. Adaptive Bitrate Streaming for Energy-Efficiency Mobile Augmented Reality [J]. Computer Science, 2022, 49(1): 194-203.
[3] E Hai-hong, ZHANG Tian-yu, SONG Mei-na. Web-based Data Visualization Chart Rendering Optimization Method [J]. Computer Science, 2021, 48(3): 119-123.
[4] ZHANG Xiao, ZHANG Si-meng, SHI Jia, DONG Cong, LI Zhan-huai. Review on Performance Optimization of Ceph Distributed Storage System [J]. Computer Science, 2021, 48(2): 1-12.
[5] ZHANG Peng-yi, SONG Jie. Research Advance on Efficiency Optimization of Blockchain Consensus Algorithms [J]. Computer Science, 2020, 47(12): 296-303.
[6] XU Chuan-fu,WANG Xi,LIU Shu,CHEN Shi-zhao,LIN Yu. Large-scale High-performance Lattice Boltzmann Multi-phase Flow Simulations Based on Python [J]. Computer Science, 2020, 47(1): 17-23.
[7] ZHANG Ling-hao, GUI Sheng-lin, MU Feng-jun, WANG Sheng. Clone Detection Algorithm for Binary Executable Code with Suffix Tree [J]. Computer Science, 2019, 46(10): 141-147.
[8] XU Qi-ze, HAN Wen-ting, CHEN Jun-shi, AN Hong. Optimization of Breadth-first Search Algorithm Based on Many-core Platform [J]. Computer Science, 2019, 46(1): 314-319.
[9] WANG Bo-tao,LIANG Wei,ZHAO Kai-li,ZHONG Han-hui,ZHANG Yu-qi. R-tree for Frequent Updates and Multi-user Concurrent Accesses Based on HBase [J]. Computer Science, 2018, 45(7): 42-52.
[10] SUN Tao, ZHANG Jun-xing. Review of SDN Performance Optimization Technology [J]. Computer Science, 2018, 45(11A): 84-91.
[11] SUN Zhi-long, Edwin H-M Sha, ZHUGE Qing-feng, CHEN Xian-zhang and WU Kai-jie. Research on Data Consistency for In-memory File Systems [J]. Computer Science, 2017, 44(2): 222-227.
[12] NI You-cong, LI Song, YE Peng and DU Xin. Random Search Rule Based Performance Evolutionary Optimization Method at Software Architecture Level [J]. Computer Science, 2017, 44(11): 156-163.
[13] ZHAO Li-wei, CHEN Xian-zhang and ZHUGE Qing-feng. Performance Comparison of Join Operations on SIMFS and EXT4 [J]. Computer Science, 2016, 43(6): 184-187.
[14] LEI Xing-bang and FANG Jun. Mass Sensor Information Storage Infrastructure Based on Fusion Database [J]. Computer Science, 2016, 43(6): 68-71.
[15] SONG Hua-zhu, DUAN Wen-jun and LIU Xiang. Ontology Storage Model Based on HBase [J]. Computer Science, 2016, 43(6): 39-43.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!