计算机科学 ›› 2020, Vol. 47 ›› Issue (6A): 474-479.doi: 10.11896/JsJkx.190900046
徐江峰谭玉龙
XU Jiang-feng and TAN Yu-long
摘要: HBase是一个分布式数据库管理系统,对于需要快速随机访问大量数据的应用程序,它正变得越来越流行。但是,它有许多性能关键配置参数,这些参数之间可能会以复杂的方式相互影响,这使得手动调整它们以获得最佳性能变得极其困难。文中提出了一种新的方法来自动调优给定HBase应用程序的配置参数,称为自动调优HBase 。其关键是建立一个以配置参数为输入的低成本性能模型。为此,系统地研究了不同的建模技术,并决定采用集成学习算法来构建性能模型。随后,利用遗传算法通过性能模型为应用程序搜索最优配置参数。因此,它可以快速且自动地识别一组配置参数值,以使应用程序的性能达到最佳。实验测试了Yahoo!云服务基准的5个应用程序,结果表明,与默认配置相比,优化后的吞吐量平均提高41%,最高可达97%。与此同时,HBase操作的延迟平均降低了11.3%,最高可达57%。
中图分类号:
[1] COOPER B F,SILBERSTEIN A,TAM E,et al.Benchmarking cloud serving systems with YCSB//Proc.1st ACM Symp.Cloud Comput.(SoCC),New York,NY,USA,2010:143-154. [2] HBase at Taobao,accessed on May 26,2017..http://www.eygle.com/digest/2012/03/hbase-at-taobao.html. [3] Apache HBase Team.Apache HBase Reference Guide.http://hbase.apache.org/book.html. [4] BAO X,LIU L,XIAO N,et al.Policy-driven configuration ma-nagement for NoSQL//Proc.IEEE 8th Int.Conf.Cloud Comput..2015:245-252. [5] BREIMAN L.Bagging predictors.Mach.Learn.,1996,24(2):123-140. [6] 赵宏,张洁,侯鲁健,等.并行GA_ANN预测模型研究.计算机工程与应用,2011(22). [7] COOPER B F,SILBERSTEIN A,TAM E,et al.Benchmarkingcloud serving systems with YCSB//Proc.1st ACM Symp.Cloud Comput.(SoCC),New York,NY,USA,2010:143-154. [8] BRODER A,MITZENMACHER M.Network applications of bloom fifilters:A survey.Internet Math.,2004,1(4):485-509. [9] BREIMAN L.Random forests.Mach.Learn.,2001,45(1):5-32. [10] EFRON B,TIBSHIRANI R J.An Introduction to Bootstrap .Boca Raton,FL,USA:CRC Press,1994. [11] LIAW A,WIENER M.lassifification and regression by randomforest.R News,2002,2(3):18-22. [12] COOPER B F,et al.PNUTS:Yahoo!’s hosted data serving platform.J.Proc.VLDB Endowment,2008,1(2):1277-1288. [13] Apache Cassandra,accessed on May 26.http://incubator.apache.org/cassandra/. [14] CALDER B,et al.Windows azure storage:A highly available cloud storage service with strong consistency//Proc.23rd ACM Symp.Oper.Syst.Principles,2011:143-157. [15] Apache CouchDB,accessed on May 26,2017..http://couchdb.apache.org/. [16] SCIORE E.SimpleDB:A simple Java-based multiuser syst forteaching database internals.ACM SIGCSE Bull.,2007,9(1):561-565. [17] ProJect Voldemort,accessed on May 26.http://proJect-voldemort.com. |
[1] | 冷典典, 杜鹏, 陈建廷, 向阳. 面向自动化集装箱码头的AGV行驶时间估计 Automated Container Terminal Oriented Travel Time Estimation of AGV 计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028 |
[2] | 宁晗阳, 马苗, 杨波, 刘士昌. 密码学智能化研究进展与分析 Research Progress and Analysis on Intelligent Cryptology 计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053 |
[3] | 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇. 基于大数据的进化网络影响力分析研究综述 Survey of Influence Analysis of Evolutionary Network Based on Big Data 计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240 |
[4] | 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩. 基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究 Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network 计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094 |
[5] | 张光华, 高天娇, 陈振国, 于乃文. 基于N-Gram静态分析技术的恶意软件分类研究 Study on Malware Classification Based on N-Gram Static Analysis Technology 计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203 |
[6] | 陈明鑫, 张钧波, 李天瑞. 联邦学习攻防研究综述 Survey on Attacks and Defenses in Federated Learning 计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079 |
[7] | 李亚茹, 张宇来, 王佳晨. 面向超参数估计的贝叶斯优化方法综述 Survey on Bayesian Optimization Methods for Hyper-parameter Tuning 计算机科学, 2022, 49(6A): 86-92. https://doi.org/10.11896/jsjkx.210300208 |
[8] | 赵璐, 袁立明, 郝琨. 多示例学习算法综述 Review of Multi-instance Learning Algorithms 计算机科学, 2022, 49(6A): 93-99. https://doi.org/10.11896/jsjkx.210500047 |
[9] | 陈钧吾, 余华山. 面向无尺度图的Δ-stepping算法改进策略 Strategies for Improving Δ-stepping Algorithm on Scale-free Graphs 计算机科学, 2022, 49(6A): 594-600. https://doi.org/10.11896/jsjkx.210400062 |
[10] | 王飞, 黄涛, 杨晔. 基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究 Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion 计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030 |
[11] | 肖治鸿, 韩晔彤, 邹永攀. 基于多源数据和逻辑推理的行为识别技术研究 Study on Activity Recognition Based on Multi-source Data and Logical Reasoning 计算机科学, 2022, 49(6A): 397-406. https://doi.org/10.11896/jsjkx.210300270 |
[12] | 姚烨, 朱怡安, 钱亮, 贾耀, 张黎翔, 刘瑞亮. 一种基于异质模型融合的 Android 终端恶意软件检测方法 Android Malware Detection Method Based on Heterogeneous Model Fusion 计算机科学, 2022, 49(6A): 508-515. https://doi.org/10.11896/jsjkx.210700103 |
[13] | 许杰, 祝玉坤, 邢春晓. 机器学习在金融资产定价中的应用研究综述 Application of Machine Learning in Financial Asset Pricing:A Review 计算机科学, 2022, 49(6): 276-286. https://doi.org/10.11896/jsjkx.210900127 |
[14] | 李野, 陈松灿. 基于物理信息的神经网络:最新进展与展望 Physics-informed Neural Networks:Recent Advances and Prospects 计算机科学, 2022, 49(4): 254-262. https://doi.org/10.11896/jsjkx.210500158 |
[15] | 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明. 大数据驱动的社会经济地位分析研究综述 Big Data-driven Based Socioeconomic Status Analysis:A Survey 计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014 |
|