计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 240800114-9.doi: 10.11896/jsjkx.240800114
梁哲恒1,2, 吴悦文4,5, 李永健3, 张小陆1,2, 沈桂泉1,2, 苏林刚6, 刘均乐3
LIANG Zheheng1,2, WU Yuewen4,5, LI Yongjian3, ZHANG Xiaolu1,2, SHEN Guiquan1,2, SU Lingang6, LIU Junle3
摘要: 大数据和流式数据计算已被广泛用于支撑智能电网中异常监测与预警等场景。云计算是大数据和流式数据应用的主流运行支撑环境,选择合适的云资源优化其性能面临巨大挑战。当前基于全量配置搜索的方法以所有候选云配置作为搜索空间,存在搜索空间过大而容易陷入局部最优解的问题。针对该问题,提出了资源偏好敏感的大数据应用云配置推荐方法,采用资源偏好敏感的随机森林模型作为贝叶斯优化方法的概率模型,以权衡配置选项空间较大时搜索的准确性和开销。实验结果表明,所提方法相比于全量配置搜索方法CherryPick,在搜索结果的准确性提升23%的同时,可减少25%~44%的搜索次数;相比于数据驱动的方法RP-CH,搜索结果的准确性相差10%,但平均搜索次数可有效减少78%。
中图分类号:
[1]Gartner[OL].https://www.gartner.com/en/doc/top-strate-gic-technology-trends-for-2024-industry-cloud-platforms. [2]CORTEZ E,BONDE A,MUZIO A,et al.Resource central:Understanding and predicting workloads for improved resource management in large cloud platforms[C]//Proceedings of the 26th Symposium on Operating Systems Principles.2017:153-167. [3]ALIPOURFARD O,LIU H Q,CHEN J S,et al.CherryPick:Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics[C]//NSDI.2017. [4]YADWADKAR N J,HARIHARAN B,GONZALEZ J E,et al.Selecting the best vm across multiple public clouds:A data-drivenperformance modeling approach[C]//Proceedings of the 2017 Symposium on Cloud Computing.2017:452-465. [5]WU Y W,WU H,REN J,et al.Heuristic based resource provisioning approach for big data analytics in cloud environment[J].Ruan Jian Xue Bao/Journal of Software,2020,31(6):1860-1874. [6]WANG X B,LI S J,PUN C M,et al.A Parkinson’s Auxiliary Diagnosis Algorithm Based on a Hyperparameter Optimization Method of Deep Learning[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2024,21(4). [7]HERODOTOU H,CHEN Y X,LU J H.A Survey on Automat-ic Parameter Tuning for Big Data Processing Systems[J].ACM Computing Surveys,2020,53(2):43. [8]ULLAH F,BABAR M A,ALDEIDA A.Design and evaluation of adaptive system for big data cyber security analytics[J].Expert Systems with Applications,2022,207:117948. [9]HSU C J,NAIR V,MENZIES T,et al.Scout:An experiencedguide to find the best cloud configuration[J].arXiv:1803.01296,2018. [10]Scout[OL].https://github.com/oxhead/scout,2024. [11]HUANG S,HUANG J,DAI J,et al.The HiBench benchmark suite:Characterization of the MapReduce-based data analysis[C]//2010 IEEE 26th International Conference on Data Engineering Workshops(ICDEW 2010).Long Beach,CA,2010:41-51. [12]Spark perf[OL].https://github.com/databricks/spark-perfF. [13]ZHANG M,LI W,ZHANG L,et al.A Pearson correlation-based adaptive variable grouping method for large-scale multi-objective optimization[J].Information Sciences,2023,639:118737. [14]BANCHHOR C,SRINIVASU N.Integrating Cuckoo search-Grey wolf optimization and Correlative Naive Bayes classifier with Map Reduce model for big data classification[J].Data & Knowledge Engineering,2020,127:101788. [15]ONAH D F O,PANG E L L,EL-HAJ M.A data-driven latent semantic analysis for automatic text summarization using lda topic modelling[C]//2022 IEEE International Conference on Big Data(Big Data).IEEE,2022:2771-2780. [16]FU P T,LUO L L,GUO D K,et al.Jump Filter:Dynamic Sketch Design for Big Data Governance[J].Ruan Jian Xue Bao/Journal of Software,2023,34(3):1193-1212. [17]JAIN N,JANA P K.LRF:A logically randomized forest algorithm for classification and regression problems[J].Expert Systems with Applications,2023,213:119225. [18]XIA M Z,MALLADI S,GURURANGAN S,et al.LESS:Selecting Influential Data for Targeted Instruction Tuning[J].arXiv:2402.04333v3,2024. [19]LIU Y Y,LI Y Y,SCHIELE B,et al.Online Hyperparameter Optimization for Class-Incremental Learning[C]//The Thirty-Seventh AAAI Conference on Artificial Intelligence(AAAI-23).2023. [20]ARLIND K,MACIEJ J,MARTIN W,et al.Scaling Laws for Hyperparameter Optimization[C]//37th Conference on Neural Information Processing Systems(NeurIPS 2023).2023. [21]LV Z,ZHANG W,CHEN Z,et al.Intelligent model updatestrategy for sequential recommendation[C]//Proceedings of the ACM on Web Conference 2024.2024:3117-3128. [22]SUN R Y.Optimization for deep learning:An overview[J].Journal of the Operations Research Society of China,2020,8(2):249-294. [23]ZHANG H,HUANG Q,ZHAI H,et al.Multi-temporal clouddetection based on robust PCA for optical remote sensing imagery[J].Computers and Electronics in Agriculture,2021,188:106342. [24]XIE A,YIN F,XU Y,et al.Distributed Gaussian Processes Hyperparameter Optimization for Big Data Using Proximal ADMM[J].IEEE Signal Processing Letters,2019,26(8):1197-1201. [25]VENKATARAMAN S,YANG Z,FRANKLIN M,et al.Er-nest:Efficient Performance Prediction for Large-Scale Advanced Analytics[C]//Networked Systems Design and Implementation.USENIX Association,2016. [26]LAMA P,ZHOU X.AROMA:automated resource allocationand configuration of mapreduce environment in the cloud[C]//International Conference on Autonomic Computing.2012:63-72. [27]HSU C J,NAIR V,FREEH V W,et al.Arrow:Low-level augmented bayesian optimization for finding the best cloud vm[C]//2018 IEEE 38th International Conference on Distributed Computing Systems(ICDCS).IEEE,2018:660-670. [28]HERODOTOU H,CHEN Y,LU J.A survey on automatic parameter tuning for big data processing systems[J].ACM Computing Surveys(CSUR),2020,53(2):1-37. [29]TP-DS benchmarks[EB/OL].https://github.com/IBM/spark-tpc-ds-performance-test. [30]SHI J,ZOU J,LU J,et al.MRTuner:a toolkit to enable holistic optimization for mapreduce jobs[C]//Proceedings of the Vldb Endowment.2014:1319-1330. [31]HERODOTOU H,DONG F,BABU S.No one(cluster) size fits all:automatic cluster sizing for data-intensive analytics[C]//ACM Symposium on Cloud Computing.2011:1-14. [32]JUVE G,DEELMAN E.Wrangler:virtual cluster provisioning for the cloud[C]//International Symposium on High PERFORMANCE Distributed Computing.2011:277-278. |
|