计算机科学 ›› 2024, Vol. 51 ›› Issue (7): 71-79.doi: 10.11896/jsjkx.231100200
白文超1, 白淑雯2, 韩希先3, 赵禹博3
BAI Wenchao1, BAI Shuwen2, HAN Xixian3, ZHAO Yubo3
摘要: 针对大数据查询领域中出现的由于查询负载随时间动态变化且难以有效预测所导致的数据库管理系统无法及时优化的问题,提出了一种基于新型时间序列预测模型的查询负载预测算法。首先,该算法采用过滤、时域间隔划分以及查询负载构造等技术对原始的历史用户查询进行预处理,得到便于网络模型分析处理的查询负载序列。其次,所提算法以时间卷积神经网络为核心构建时序预测模型,提取查询负载数据的历史变化趋势及自相关性特征,高效地实现时序预测;同时,融入设计的时域注意力机制,对查询负载序列进行重要性加权,保证模型的分析计算效率,提升算法的预测性能。最后,基于上述时序预测模型,充分利用查询间隔时间完成对未来查询负载的精确预测,使得数据库管理系统得以预先实现自身性能调优,以适应工作负载的动态变化。实验结果表明,设计的查询负载预测算法在多个评价指标中均表现出良好的预测性能,并且能够在查询时间间隔内更加精确地预测未来查询负载的变化。
中图分类号:
[1]LIU C,MAO W,GAO Y,et al.Adaptive recollected RNN for workload forecasting in database-as-a-service[C]//18th International Conference Service-Oriented Computing(ICSOC).Berlin:Springer,2020:431-438. [2]SHAHEEN N,RAZA B,SHAHID A R,et al.A novel optimized case-based reasoning approach with k-means clustering and genetic algorithm for predicting multi-class workload characterization in autonomic database and data warehouse system[J].IEEE Access,2020,8(1):105713-105727. [3]SHAHEEN N,RAZA B,SHAHID A R,et al.Autonomic work-load performance modeling for large-scale databases and data warehouses through deep belief network with data augmentation using conditional generative adversarial networks[J].IEEE Access,2021,9(1):97603-97620. [4]QIAN H,WEN Q,SUN L,et al.RobustScaler:QoS-Aware autoscaling for complex workloads[C]//2022 IEEE 38th International Conference on Data Engineering(ICDE).Piscataway:IEEE,2022:2762-2775. [5]YUAN Z,CHEN H,HUANG Z,et al.A lightweight generaladaptive optimization tool for relational DBMSs under HTAP workloads[C]//2022 IEEE International Conference on Services Computing(SCC).Piscataway:IEEE,2022:45-53. [6]MEDURI V V,CHOWDHURY K,SARWAT M.Evaluation of machine learning algorithms in predicting the next SQL query from the future[J].ACM Transactions on Database Systems(TODS),2021,46(1):1-46. [7]ZHI KANG J K,GAURAV,TAN S Y,et al.Efficient deeplearning pipelines for accurate cost estimations over large scale query workload[C]//Proceedings of the 2021 ACM SIGMOD International Conference on Management of Data.New York:ACM,2021:1014-1022. [8]TANG C,WANG B,LUO Z,et al.Forecasting SQL query cost at twitter[C]//2021 IEEE International Conference on Cloud Engineering(IC2E).Piscataway:IEEE,2021:154-160. [9]YAN Z,LU J,CHAINANI N,et al.Workload-Aware perfor-mance tuning for autonomous DBMSs[C]//2021 IEEE 37th International Conference on Data Engineering(ICDE).Piscataway:IEEE,2021:2365-2368. [10]TAFT R,ELSAYED N,SERAFINI M,et al.P-store:An elastic database system with predictive provisioning[C]//Proceedings of the 2018 International Conference on Management of Data.New York:ACM,2018:205-219. [11]ELNAFFAR S,MARTIN P.The Psychic-Skeptic predictionframework for effective monitoring of DBMS workloads[J].Data & Knowledge Engineering,2009,68(4):393-414. [12]HIGGINSON A S,DEDIU M,ARSENE O,et al.Databaseworkload capacity planning using time series analysis and machine learning[C]//Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data.New York:ACM,2020:769-783. [13]LORIDO B T,MIGUEL A J,LOZANO J A.A review of auto-scaling techniques for elastic applications in cloud environments[J].Journal of Grid Computing,2014,12(4):559-592. [14]PAVLO A,JONES E P C,ZDONIK S.On predictive modeling for optimizing transaction execution in parallel OLTP systems[J].arXiv:1110.6647,2011. [15]HOLZE M,RITTER N.Autonomic databases:Detection ofworkload shifts with n-gram-models[C]//East European Conference on Advances in Databases and Information Systems.Berlin:Springer,2008:127-142. [16]DU N,YE X,WANG J.Towards workflow-driven database system workload modeling[C]//Proceedings of the Second International Workshop on Testing Database Systems.New York:ACM,2009:1-6. [17]MA L,VAN AKEN D,HENFNY A,et al.Query-based workload forecasting for self-driving database management systems[C]//Proceedings of the 2018 International Conference on Ma-nagement of Data.New York:ACM,2018:631-645. [18]SHAHRIVARI H,PAPAPETROU O,FLETCHER G.Workload prediction for adaptive approximate query processing[C]//2022 IEEE International Conference on Big Data(Big Data).Piscataway:IEEE,2022:217-222. [19]DURAND G C,PINNECKE M,PIRIYEV R,et al.GridFormation:towards self-driven online data partitioning using reinforcement learning[C]//Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management.New York:ACM,2018:1-7. [20]HUANG X,CAO S,GAO Y,et al.LightPro:Lightweight pro-babilistic workload prediction framework for database-as-a-ser-vice[C]//2022 IEEE International Conference on Web Services(ICWS).Piscataway:IEEE,2022:160-169. [21]MOZAFARI B,CURINO C,JINDAL A,et al.Performance and resource modeling in highly-concurrent OLTP workloads[C]//Proceedings of the 2013 ACM SIGMOD International Confe-rence on Management of Data.New York:ACM,2013:301-312. [22]JAIN S,HOWE B,YAN J,et al.Query2vec:An evaluation of NLP techniques for generalized workload analytics[J].arXiv:1801.05613,2018. [23]HUANG X,CHENG Y,GAO X,et al.TEALED:A multi-step workload forecasting approach using time-sensitive EMD and auto LSTM Encoder-Decoder[C]//27th International Confe-rence Database Systems for Advanced Applications(DASFAA).Berlin:Springer,2022:706-713. [24]XU M,SONG C,WU H,et al.esDNN:deep neural network based multivariate workload prediction in cloud computing environments[J].ACM Transactions on Internet Technology(TOIT),2022,22(3):1-24. |
|