Computer Science ›› 2025, Vol. 52 ›› Issue (6A): 240800114-9.doi: 10.11896/jsjkx.240800114

• Big Data & Data Science • Previous Articles     Next Articles

Resource Preference-sensitive Cloud Configuration Recommendation Method for Big DataApplications

LIANG Zheheng1,2, WU Yuewen4,5, LI Yongjian3, ZHANG Xiaolu1,2, SHEN Guiquan1,2, SU Lingang6, LIU Junle3   

  1. 1 Information Center of Guangdong Power Grid Limited Liability Company,Guangzhou 510000,China
    2 Joint Laboratory on Cyberspace Security,China Southern Power Grid,Guangzhou 510000,China
    3 Zhongshan Power Supply Bureau,Zhongshan,Guangdong 528400,China
    4 Institute of Software,Chinese Academy of Science,Beijing 100190,China
    5 Key Laboratory of System Software(Chinese Academy of Sciences),Beijing 100190,China
    6 Baidu Online Network Technology(Beijing) Co.,Ltd.,Beijing 100085,China
  • Online:2025-06-16 Published:2025-06-12
  • About author:LIANG Zheheng,born in 1986,master,is a member of CCF(No.T0223M).His main research interests include digital evaluation technology,Internet of Things and artificial intelligence.
    WU Yuewen,born in 1988,Ph.D,senior engineer,is a member of CCF(No.J6673M).His main research interests include performance optimization of cloud-edge systems and so on.
  • Supported by:
    Guangdong Power Grid Limited Liability Company(037800KC23090006) and National Natural Science Foundation of China(62302489).

Abstract: Big data and stream data computing have been widely used to support scenarios such as anomaly detection and early warning in smart grids.Cloud computing serves as the mainstream operating environment for big data and stream data applications.However,optimizing performance by selecting suitable cloud resources poses significant challenges.Current methods based on exhaustive configuration searches use all candidate cloud configurations as the search space,leading to excessively large search spaces and have the risk of getting stuck in local optima.To address this issue,this paper proposes a resource preference-sensitive cloud configuration recommendation method for big data applications.It employs a resource preference-sensitive random forest model as the probabilistic model in Bayesian optimization to balance the accuracy and cost of searches when the configuration option space is large.Experimental results show that,compared to the exhaustive configuration search method CherryPick,the proposed method improves search accuracy by 23% while reducing the number of searches by 25%~44%.Compared to the data-driven method RP-CH,the accuracy of search results is 10% lower,but the average number of searches is effectively reduced by 78%.

Key words: Big data applicaiton, Cloud configuration recommendation, Resource preference, PCA, Bayesian optimization

CLC Number: 

  • TP311
[1]Gartner[OL].https://www.gartner.com/en/doc/top-strate-gic-technology-trends-for-2024-industry-cloud-platforms.
[2]CORTEZ E,BONDE A,MUZIO A,et al.Resource central:Understanding and predicting workloads for improved resource management in large cloud platforms[C]//Proceedings of the 26th Symposium on Operating Systems Principles.2017:153-167.
[3]ALIPOURFARD O,LIU H Q,CHEN J S,et al.CherryPick:Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics[C]//NSDI.2017.
[4]YADWADKAR N J,HARIHARAN B,GONZALEZ J E,et al.Selecting the best vm across multiple public clouds:A data-drivenperformance modeling approach[C]//Proceedings of the 2017 Symposium on Cloud Computing.2017:452-465.
[5]WU Y W,WU H,REN J,et al.Heuristic based resource provisioning approach for big data analytics in cloud environment[J].Ruan Jian Xue Bao/Journal of Software,2020,31(6):1860-1874.
[6]WANG X B,LI S J,PUN C M,et al.A Parkinson’s Auxiliary Diagnosis Algorithm Based on a Hyperparameter Optimization Method of Deep Learning[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2024,21(4).
[7]HERODOTOU H,CHEN Y X,LU J H.A Survey on Automat-ic Parameter Tuning for Big Data Processing Systems[J].ACM Computing Surveys,2020,53(2):43.
[8]ULLAH F,BABAR M A,ALDEIDA A.Design and evaluation of adaptive system for big data cyber security analytics[J].Expert Systems with Applications,2022,207:117948.
[9]HSU C J,NAIR V,MENZIES T,et al.Scout:An experiencedguide to find the best cloud configuration[J].arXiv:1803.01296,2018.
[10]Scout[OL].https://github.com/oxhead/scout,2024.
[11]HUANG S,HUANG J,DAI J,et al.The HiBench benchmark suite:Characterization of the MapReduce-based data analysis[C]//2010 IEEE 26th International Conference on Data Engineering Workshops(ICDEW 2010).Long Beach,CA,2010:41-51.
[12]Spark perf[OL].https://github.com/databricks/spark-perfF.
[13]ZHANG M,LI W,ZHANG L,et al.A Pearson correlation-based adaptive variable grouping method for large-scale multi-objective optimization[J].Information Sciences,2023,639:118737.
[14]BANCHHOR C,SRINIVASU N.Integrating Cuckoo search-Grey wolf optimization and Correlative Naive Bayes classifier with Map Reduce model for big data classification[J].Data & Knowledge Engineering,2020,127:101788.
[15]ONAH D F O,PANG E L L,EL-HAJ M.A data-driven latent semantic analysis for automatic text summarization using lda topic modelling[C]//2022 IEEE International Conference on Big Data(Big Data).IEEE,2022:2771-2780.
[16]FU P T,LUO L L,GUO D K,et al.Jump Filter:Dynamic Sketch Design for Big Data Governance[J].Ruan Jian Xue Bao/Journal of Software,2023,34(3):1193-1212.
[17]JAIN N,JANA P K.LRF:A logically randomized forest algorithm for classification and regression problems[J].Expert Systems with Applications,2023,213:119225.
[18]XIA M Z,MALLADI S,GURURANGAN S,et al.LESS:Selecting Influential Data for Targeted Instruction Tuning[J].arXiv:2402.04333v3,2024.
[19]LIU Y Y,LI Y Y,SCHIELE B,et al.Online Hyperparameter Optimization for Class-Incremental Learning[C]//The Thirty-Seventh AAAI Conference on Artificial Intelligence(AAAI-23).2023.
[20]ARLIND K,MACIEJ J,MARTIN W,et al.Scaling Laws for Hyperparameter Optimization[C]//37th Conference on Neural Information Processing Systems(NeurIPS 2023).2023.
[21]LV Z,ZHANG W,CHEN Z,et al.Intelligent model updatestrategy for sequential recommendation[C]//Proceedings of the ACM on Web Conference 2024.2024:3117-3128.
[22]SUN R Y.Optimization for deep learning:An overview[J].Journal of the Operations Research Society of China,2020,8(2):249-294.
[23]ZHANG H,HUANG Q,ZHAI H,et al.Multi-temporal clouddetection based on robust PCA for optical remote sensing imagery[J].Computers and Electronics in Agriculture,2021,188:106342.
[24]XIE A,YIN F,XU Y,et al.Distributed Gaussian Processes Hyperparameter Optimization for Big Data Using Proximal ADMM[J].IEEE Signal Processing Letters,2019,26(8):1197-1201.
[25]VENKATARAMAN S,YANG Z,FRANKLIN M,et al.Er-nest:Efficient Performance Prediction for Large-Scale Advanced Analytics[C]//Networked Systems Design and Implementation.USENIX Association,2016.
[26]LAMA P,ZHOU X.AROMA:automated resource allocationand configuration of mapreduce environment in the cloud[C]//International Conference on Autonomic Computing.2012:63-72.
[27]HSU C J,NAIR V,FREEH V W,et al.Arrow:Low-level augmented bayesian optimization for finding the best cloud vm[C]//2018 IEEE 38th International Conference on Distributed Computing Systems(ICDCS).IEEE,2018:660-670.
[28]HERODOTOU H,CHEN Y,LU J.A survey on automatic parameter tuning for big data processing systems[J].ACM Computing Surveys(CSUR),2020,53(2):1-37.
[29]TP-DS benchmarks[EB/OL].https://github.com/IBM/spark-tpc-ds-performance-test.
[30]SHI J,ZOU J,LU J,et al.MRTuner:a toolkit to enable holistic optimization for mapreduce jobs[C]//Proceedings of the Vldb Endowment.2014:1319-1330.
[31]HERODOTOU H,DONG F,BABU S.No one(cluster) size fits all:automatic cluster sizing for data-intensive analytics[C]//ACM Symposium on Cloud Computing.2011:1-14.
[32]JUVE G,DEELMAN E.Wrangler:virtual cluster provisioning for the cloud[C]//International Symposium on High PERFORMANCE Distributed Computing.2011:277-278.
[1] LI Haixia, SONG Danlei, KONG Jianing, SONG Yafei, CHANG Haiyan. Evaluation of Hyperparameter Optimization Techniques for Traditional Machine Learning Models [J]. Computer Science, 2024, 51(8): 242-255.
[2] CHEN Xiangxiao, CUI Xin, DU Qin, TANG Haoyao. Study on Optimization of Abnormal Traffic Detection Model Based on Machine Learning [J]. Computer Science, 2024, 51(6A): 230700051-5.
[3] ZAHO Peng, ZHOU Jiantao, ZHAO Daming. Cloud Computing Load Prediction Method Based on Hybrid Model of CEEMDAN-ConvLSTM [J]. Computer Science, 2023, 50(6A): 220300272-9.
[4] LI Ya-ru, ZHANG Yu-lai, WANG Jia-chen. Survey on Bayesian Optimization Methods for Hyper-parameter Tuning [J]. Computer Science, 2022, 49(6A): 86-92.
[5] HUANG Xiao-sheng, XU Jing. Multi-focus Image Fusion Method Based on PCANet in NSST Domain [J]. Computer Science, 2021, 48(9): 181-186.
[6] HU Yu-wen. Stock Forecast Based on Optimized LSTM Model [J]. Computer Science, 2021, 48(6A): 151-157.
[7] HAN Xu, CHEN Hai-yun, WANG Yi, XU Jin. Face Recognition Using SPCA and HOG with Single Training Image Per Person [J]. Computer Science, 2019, 46(6A): 274-278.
[8] LI Meng-xiao, YAO Shi-yuan. Design and Improvement of Face Recognition System Based on PCA [J]. Computer Science, 2019, 46(6A): 577-579.
[9] SHI Yan-yan, BAI Jing. Speech Recognition Combining CFCC and Teager Energy Operators Cepstral Coefficients [J]. Computer Science, 2019, 46(5): 286-289.
[10] ZHANG Ming-yue, WANG Jing. Interactive Likelihood Target Tracking Algorithm Based on Deep Learning [J]. Computer Science, 2019, 46(2): 279-285.
[11] LI Xiao-xin, ZHOU Yuan-shen, ZHOU Xuan, LI Jing-jing, LIU Zhi-yong. Gabor Occlusion Dictionary Learning via Singular Value Decomposition [J]. Computer Science, 2018, 45(6): 275-283.
[12] LI Xiao-xin, WU Ke-song, QI Pan-pan, ZHOU Xuan and LIU Zhi-yong. Local Sphere Normalization Embedding:An Improved Scheme for PCANet [J]. Computer Science, 2018, 45(5): 238-242.
[13] LI Shan-shan, CHEN Li, ZHANG Yong-xin and YUAN Ya-ting. Fuzzy Edge Detection Algorithm Based on RPCA [J]. Computer Science, 2018, 45(5): 273-279.
[14] ZHONG Fei and YANG Bin. License Plate Detection Based on Principal Component Analysis Network [J]. Computer Science, 2018, 45(3): 268-273.
[15] XIA Chong-huan, LI Hua-kang, SUN Guo-zi. Microblogging Malicious User Identification Based on Behavior Characteristic Analysis [J]. Computer Science, 2018, 45(12): 111-116.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!