计算机科学 ›› 2025, Vol. 52 ›› Issue (8): 109-117.doi: 10.11896/jsjkx.250100151

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于学习排序的查询优化算法

余阳, 彭煜玮   

  1. 武汉大学计算机学院 武汉 430061
  • 收稿日期:2025-01-23 修回日期:2025-03-07 出版日期:2025-08-15 发布日期:2025-08-08
  • 通讯作者: 彭煜玮(ywpeng@whu.edu.cn)
  • 作者简介:(yu_yang@whu.edu.cn)
  • 基金资助:
    国家重点研发计划(2023YFB4503604)

Query Optimization Algorithm Based on Learning to Rank

YU Yang, PENG Yuwei   

  1. School of Computer Science,Wuhan University,Wuhan 430061,China
  • Received:2025-01-23 Revised:2025-03-07 Online:2025-08-15 Published:2025-08-08
  • About author:YU Yang,born in 2000,postgraduate.His main research interests include database,query optimization,and AI4DB.
    PENG Yuwei,born in 1980,Ph.D,associate professor.His main research in terests include database systems,big data,and digital watermarking.
  • Supported by:
    National Key R&D Program of China(2023YFB4503604).

摘要: 查询优化是关系型数据库中的关键环节。在传统的查询优化过程中,为了获得较优的执行计划,通常需要对查询中的连接和过滤操作进行基数估计。然而,基数估计存在不准确的问题,导致查询优化效果往往不尽如人意。目前,已有部分研究通过基于机器学习的方法改善基数估计问题并取得了一定进展。尽管这些方法在处理查询中数值类型的过滤谓词时表现较好,但对于其他复杂的过滤谓词效果不佳。为解决这一问题,文中提出了一种基于学习排序的查询优化算法。该算法能够为单一查询智能评估多个执行计划并排序,从而选择最佳计划执行。该查询优化算法通过迭代挖掘较优执行计划,并协同机器学习方法,最终筛选出最优计划。实验结果表明,该算法在常规数据集上的性能优于当前基于学习的查询优化算法,并且在复杂数据集中具有更加显著的优势。

关键词: 查询优化, 计划生成, 学习排序, 数据库, 连接顺序, 连接类型, 扫描类型

Abstract: Query optimization is a key aspect of relational databases.In the traditional query optimization process,cardinality estimation of join and filter operations in a query is usually required in order to obtain a better execution plan.However,due to the inaccuracy of cardinality estimation,the results of query optimization are often unsatisfactory.Currently,some researches have been conducted to improve the cardinality estimation through machine learning-based methods and have made some progress.This paper finds that although these methods perform better in dealing with filtering predicates of numerical types in queries,they are ineffective for other complex filtering predicates.To address this problem,this paper proposes a query optimization algorithm based on learning to rank.The algorithm is capable of intelligently evaluating and ranking multiple execution plans for a single query to select the best plan for execution.The query optimization algorithm iteratively mines the better execution plans and collaborates with machine learning methods to finally filter out the optimal plan.Experimental results show that the proposed algorithm outperforms current learning-based query optimization algorithms on regular datasets and is more significant on complex datasets.

Key words: Query optimization, Plan generation, Learning to rank, Database, Join order, Join type, Scan type

中图分类号: 

  • TP392
[1]LEIS V,GUBICHEV A,MIRCHEV A,et al.How Good Are Query Optimizers,Really?[J].Proceedings of the VLDB Endowment,2015,9(3):204-215.
[2]WU P,CONG G.A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation[C]//International Conference on Management of Data,Virtual Event(SIGMOD '21).ACM,2021:2009-2022.
[3]WU Z,SHAIKHHA A.BayesCard:A Unified Bayesian Framework for Cardinality Estimation[J].arXiv:2012.14743,2020.
[4]YANG Z,KAMSETTY A,LUAN S,et al.NeuroCard:One Cardinality Estimator for All Tables[J].Proceedings of the VLDB Endowment,2020,14(1):61-73.
[5]YANG Z,LIANG E,KAMSETTY A,et al.Deep Unsupervised Cardinality Estimation[J].Proceedigns of the VLDB Endowment,2019,13(3):279-292.
[6]ZHANG J,ZHANG C,LI G,et al.AutoCE:An Accurate and Efficient Model Advisor for Learned Cardinality Estimation[C]//39th IEEE International Conference on Data Engineering(ICDE 2023).IEEE,2023:2621-2633.
[7]ZHU R,WU Z,HAN Y,et al.FLAT:Fast,Lightweight andAccurate Method for Cardinality Estimation[J].Proceedings VLDB Endowment,2021,14(9):1489-1502.
[8]MARCUS R,NEGI P,MAO H,et al.Bao:Making LearnedQuery Optimization Practical[C]//SIGMOD.ACM,2021:1275-1288.
[9]ZHU R,CHEN W,DING B,et al.LERO:A Learning-to-Rank Query Optimizer[J].Proceedings of the VLDB Endowment,2023,16(6):1466-1479.
[10]CHEN X,CHEN H,LIANG Z,et al.LEON:A new framework for ml-aided query optimization[J].Proceedings VLDB Endowment,2023,16:2261-2273.
[11]LAN H,BAO Z,PENG Y.A Survey on Advancing the DBMS Query Optimizer:Cardinality Estimation,Cost Model,and Plan Enumeration[J].Data Sci.Eng.,2021,6(1):86-101.
[12]OLKEN F,ROTEM D.Random Sampling from Database Files:A Survey[C]//Statistical and Scientific Database Management,5th International Conference SSDBM,Charlotte,NC(Lecture Notes in Computer Science,Vol.420).Springer,1990:92-111.
[13]CORMODE G,GAROFALAKIS M N,HAAS P J,et al.Synopses for Massive Data:Samples,Histograms,Wavelets,Sketches[J].Found.Trends Databases,2012,4(1/2/33):1-294.
[14]HASAN S,THIRUMURUGANATHAN S,AUGUSTINE J,et al.Deep Learning Models for Selectivity Estimation of Multi-Attribute Queries[C]//Proceedings of the 2020 International Conference on Management of Data,SIGMOD Conference 2020.ACM,2020:1035-1050.
[15]HILPRECHT B,SCHMIDT A,KULESSA M,et al.DeepDB:Learn from Data,not from Queries![J].Proceedings of the VLDB Endowment,2020,13(7):992-1005.
[16]KIPF A,KIPF T,RADKE B,et al.Learned Cardinalities:Estimating Correlated Joins with Deep Learning[J].arXiv:1809.00677,2018.
[17]SUN J,LI G,TANG N.Learned Cardinality Estimation for Similarity Queries[C]//International Conference on Management of Data(SIGMOD '21).ACM,2021:1745-1757.
[18]ILYAS I F,BESKALES G,SOLIMAN M A.Probabilistic query optimization in database systems[J].ACM SIGMOD Record,2008,37(3):4-11.
[19]ROY D,PANDA P,ROY K.Tree-CNN:A Deep Convolutional Neural Network for Lifelong Learning[J].arXiv:1802.05800,2018.
[20]MOU L,LI G,ZHANG L,et al.Convolutional Neural Networks over Tree Structures for Programming Language Processing[C]//AAAI.AAAI Press,2016:1287-1293.
[21]MARCUS R,PAPAEMMANOUIL O.Plan-Structured DeepNeural Network Models for Query Performance Prediction[J].Proceedings VLDB Endowment,2019,12(11):1733-1746.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!