计算机科学 ›› 2024, Vol. 51 ›› Issue (7): 71-79.doi: 10.11896/jsjkx.231100200

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于TCN-A模型的高效查询负载预测算法

白文超1, 白淑雯2, 韩希先3, 赵禹博3   

  1. 1 哈尔滨工业大学计算学部 哈尔滨 150001
    2 开封大学信息化管理中心 河南 开封 475004
    3 哈尔滨工业大学计算学部 山东 威海 264209
  • 收稿日期:2023-11-29 修回日期:2024-04-26 出版日期:2024-07-15 发布日期:2024-07-10
  • 通讯作者: 白文超(Baiwenchao1998@163.com)

Efficient Query Workload Prediction Algorithm Based on TCN-A

BAI Wenchao1, BAI Shuwen2, HAN Xixian3, ZHAO Yubo3   

  1. 1 Faculty of Computing,Harbin Institute of Technology,Harbin 150001,China
    2 Information Technology Management Center,Kaifeng University,Kaifeng,Henan 475004,China
    3 Faculty of Computing,Harbin Institute of Technology,Weihai,Shandong 264209,China
  • Received:2023-11-29 Revised:2024-04-26 Online:2024-07-15 Published:2024-07-10
  • About author:BAI Wenchao,born in 1998,Ph.D,is a member of CCF(No.R5032G).His main research interests include explaina-ble machine learning and intelligent big data processing.

摘要: 针对大数据查询领域中出现的由于查询负载随时间动态变化且难以有效预测所导致的数据库管理系统无法及时优化的问题,提出了一种基于新型时间序列预测模型的查询负载预测算法。首先,该算法采用过滤、时域间隔划分以及查询负载构造等技术对原始的历史用户查询进行预处理,得到便于网络模型分析处理的查询负载序列。其次,所提算法以时间卷积神经网络为核心构建时序预测模型,提取查询负载数据的历史变化趋势及自相关性特征,高效地实现时序预测;同时,融入设计的时域注意力机制,对查询负载序列进行重要性加权,保证模型的分析计算效率,提升算法的预测性能。最后,基于上述时序预测模型,充分利用查询间隔时间完成对未来查询负载的精确预测,使得数据库管理系统得以预先实现自身性能调优,以适应工作负载的动态变化。实验结果表明,设计的查询负载预测算法在多个评价指标中均表现出良好的预测性能,并且能够在查询时间间隔内更加精确地预测未来查询负载的变化。

关键词: 时间卷积神经网络, 注意力机制, 查询负载

Abstract: The query workload prediction algorithm based on a novel time series prediction model is proposed to address the pro-blem of database management system cannot be optimized in time due to the dynamic change of query workload and the difficulty of forecasting effectively in the field of big data querying.First of all,the algorithm preprocesses the original historical users' queries by filtering,temporal interval partition and query workload construction to obtain the query workload sequence which is convenient for the network model to analyze and process.Secondly,the algorithm constructs a time series prediction model with temporal convolution network as the core,extracts the historical trend and auto-correlation characteristics of query workload,and realizes the time series prediction efficiently.At the same time,the algorithm integrates the designed temporal attention mechanism to weight the important query workloads to ensure that the query workload sequence can be analyzed and calculated efficiently by the model,and thus improving the performance of prediction algorithm.Finally,the algorithm uses the above time series prediction model to make full use of the query interval time to accurately predict the future query workloads,so that the database management system can achieve self-performance tuning in advance to adapt to the dynamic change of the workloads.Expe-rimental results show that the designed query workload prediction algorithm exhibits good prediction performance on several evaluation metrics and is able to predict future query workload accurately over the query time interval.

Key words: Temporal convolutional network, Attention mechanism, Query workload

中图分类号: 

  • TP302
[1]LIU C,MAO W,GAO Y,et al.Adaptive recollected RNN for workload forecasting in database-as-a-service[C]//18th International Conference Service-Oriented Computing(ICSOC).Berlin:Springer,2020:431-438.
[2]SHAHEEN N,RAZA B,SHAHID A R,et al.A novel optimized case-based reasoning approach with k-means clustering and genetic algorithm for predicting multi-class workload characterization in autonomic database and data warehouse system[J].IEEE Access,2020,8(1):105713-105727.
[3]SHAHEEN N,RAZA B,SHAHID A R,et al.Autonomic work-load performance modeling for large-scale databases and data warehouses through deep belief network with data augmentation using conditional generative adversarial networks[J].IEEE Access,2021,9(1):97603-97620.
[4]QIAN H,WEN Q,SUN L,et al.RobustScaler:QoS-Aware autoscaling for complex workloads[C]//2022 IEEE 38th International Conference on Data Engineering(ICDE).Piscataway:IEEE,2022:2762-2775.
[5]YUAN Z,CHEN H,HUANG Z,et al.A lightweight generaladaptive optimization tool for relational DBMSs under HTAP workloads[C]//2022 IEEE International Conference on Services Computing(SCC).Piscataway:IEEE,2022:45-53.
[6]MEDURI V V,CHOWDHURY K,SARWAT M.Evaluation of machine learning algorithms in predicting the next SQL query from the future[J].ACM Transactions on Database Systems(TODS),2021,46(1):1-46.
[7]ZHI KANG J K,GAURAV,TAN S Y,et al.Efficient deeplearning pipelines for accurate cost estimations over large scale query workload[C]//Proceedings of the 2021 ACM SIGMOD International Conference on Management of Data.New York:ACM,2021:1014-1022.
[8]TANG C,WANG B,LUO Z,et al.Forecasting SQL query cost at twitter[C]//2021 IEEE International Conference on Cloud Engineering(IC2E).Piscataway:IEEE,2021:154-160.
[9]YAN Z,LU J,CHAINANI N,et al.Workload-Aware perfor-mance tuning for autonomous DBMSs[C]//2021 IEEE 37th International Conference on Data Engineering(ICDE).Piscataway:IEEE,2021:2365-2368.
[10]TAFT R,ELSAYED N,SERAFINI M,et al.P-store:An elastic database system with predictive provisioning[C]//Proceedings of the 2018 International Conference on Management of Data.New York:ACM,2018:205-219.
[11]ELNAFFAR S,MARTIN P.The Psychic-Skeptic predictionframework for effective monitoring of DBMS workloads[J].Data & Knowledge Engineering,2009,68(4):393-414.
[12]HIGGINSON A S,DEDIU M,ARSENE O,et al.Databaseworkload capacity planning using time series analysis and machine learning[C]//Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data.New York:ACM,2020:769-783.
[13]LORIDO B T,MIGUEL A J,LOZANO J A.A review of auto-scaling techniques for elastic applications in cloud environments[J].Journal of Grid Computing,2014,12(4):559-592.
[14]PAVLO A,JONES E P C,ZDONIK S.On predictive modeling for optimizing transaction execution in parallel OLTP systems[J].arXiv:1110.6647,2011.
[15]HOLZE M,RITTER N.Autonomic databases:Detection ofworkload shifts with n-gram-models[C]//East European Conference on Advances in Databases and Information Systems.Berlin:Springer,2008:127-142.
[16]DU N,YE X,WANG J.Towards workflow-driven database system workload modeling[C]//Proceedings of the Second International Workshop on Testing Database Systems.New York:ACM,2009:1-6.
[17]MA L,VAN AKEN D,HENFNY A,et al.Query-based workload forecasting for self-driving database management systems[C]//Proceedings of the 2018 International Conference on Ma-nagement of Data.New York:ACM,2018:631-645.
[18]SHAHRIVARI H,PAPAPETROU O,FLETCHER G.Workload prediction for adaptive approximate query processing[C]//2022 IEEE International Conference on Big Data(Big Data).Piscataway:IEEE,2022:217-222.
[19]DURAND G C,PINNECKE M,PIRIYEV R,et al.GridFormation:towards self-driven online data partitioning using reinforcement learning[C]//Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management.New York:ACM,2018:1-7.
[20]HUANG X,CAO S,GAO Y,et al.LightPro:Lightweight pro-babilistic workload prediction framework for database-as-a-ser-vice[C]//2022 IEEE International Conference on Web Services(ICWS).Piscataway:IEEE,2022:160-169.
[21]MOZAFARI B,CURINO C,JINDAL A,et al.Performance and resource modeling in highly-concurrent OLTP workloads[C]//Proceedings of the 2013 ACM SIGMOD International Confe-rence on Management of Data.New York:ACM,2013:301-312.
[22]JAIN S,HOWE B,YAN J,et al.Query2vec:An evaluation of NLP techniques for generalized workload analytics[J].arXiv:1801.05613,2018.
[23]HUANG X,CHENG Y,GAO X,et al.TEALED:A multi-step workload forecasting approach using time-sensitive EMD and auto LSTM Encoder-Decoder[C]//27th International Confe-rence Database Systems for Advanced Applications(DASFAA).Berlin:Springer,2022:706-713.
[24]XU M,SONG C,WU H,et al.esDNN:deep neural network based multivariate workload prediction in cloud computing environments[J].ACM Transactions on Internet Technology(TOIT),2022,22(3):1-24.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!