Computer Science ›› 2024, Vol. 51 ›› Issue (7): 40-48.doi: 10.11896/jsjkx.231000143

• Database & Big Data & Data Science • Previous Articles     Next Articles

Advances in SQL Intelligent Synthesis Technology

LIU Yumeng1,2, ZHAO Yijing1,2, WANG Bicong1, WANG Chao1, ZHANG Baomin1   

  1. 1 Institute of Software,Chinese Academy of Sciences,Beijing 100190,China
    2 University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2023-10-19 Revised:2024-03-29 Online:2024-07-15 Published:2024-07-10
  • About author:LIU Yumeng,born in 1989,Ph.D.His main research interests include database technology,time series data analysis and data mining.
    ZHAO Yijing,born in 1994,Ph.D candidate.Her main research interests include database technology and data mining.

Abstract: In recent years,with the rapid development of technologies such as big data and cloud computing,large-scale data ge-neration has deepened the dependence of various applications on database technology.However,traditional databases typically operate through the formalized database query language SQL,which poses a significant difficulty for users without programming or database usage experience,reducing the accessibility of databases across various fields.With the rapid advancement of artificial intelligence technologies like machine learning and deep neural networks,especially the surge of large language model technology sparked by the emergence of ChatGPT,there has been a profound synthesis and technological transformation of databases and intelligent technology.Intelligent methods are employed to automatically translate user input language into SQL,meeting the operational needs of database users of varying levels of expertise and enhancing databases' intelligence,environmental adaptability,and user-friendliness.To comprehensively focus on the latest research developments in intelligent SQL generation technology,this paper delves into three types of user inputs-example-based,text-based,and voice-based-and provides a detailed exposition of the research trajectory,representative works,and the latest advancements of various intelligent synthesis models.Additionally,this paper categorizes and compares the technical frameworks of these methods and provides an overall summary.Finally,it paper looks forward to future development directions in light of existing problems and challenges with current methods.

Key words: Database technology, Intelligent SQL synthesis, Semantic parsing, SQL syntax, Large language models

CLC Number: 

  • TP315
[1]WOODS W A.Progress in natural language understanding:an application to lunar geology[C]//National Computer Conference and Exposition.Association for Computing Machinery,1973:441-450.
[2]CODD E F.Seven Steps to Rendezvous with the Casual User[C]//IFIP TC-2 Working Conference Data Base Management Systems.1974.
[3]SACERDOTI E D.Language Access to Distributed Data withError Recovery[C]//International Joint Conference on Artificial Intelligence.1977:196-202.
[4]WARREN D H D,PEREIRA F C.An Efficient Easily Adaptable System for Interpreting Natural Language Queries[J].American Journal of Computational Linguistics,1982,8:110-122.
[5]ZHANG S,SUN Y.Automatically synthesizing SQL queriesfrom input-output examples[C]//2013 IEEE/ACM 28th International Conference on Automated Software Engineering(ASE).IEEE,2013:224-234.
[6]LI H,CHAN C Y,MAIER D.Query from examples:an iterative,data-driven approach to query construction[J].Proceedings of the VLDB Endowment,2015,8(13):2158-2169.
[7]WANG C,CHEUNG A,BODIK R.Synthesizing highly expressive SQL queries from input-output examples[C]//ACM SIGPLAN Conference on Programming Language Design and Implementation.ACM,2017:452-466.
[8]THAKKAR A,NAIK A,SANDS N,et al.Example-GuidedSynthesis of Relational Queries[C]//Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation.Association for Computing Machinery,2021:1110-1125.
[9]LAW M,RUSSO A,BRODA K.Inductive Learning of Answer Set Programs[C]//European Workshop on Logics in Artificial Intelligence.2014:311-325.
[10]RAGHOTHAMAN M,MENDELSON J,ZHAO D,et al.Prov-enance-Guided Synthesis of Datalog Programs[J].Proceedings of the ACM on Programming Languages,2019,4(POPL):1-27.
[11]ZHOU X,BODIK R,CHEUNG A,et al.Synthesizing analytical SQL queries from computation demonstration[C]//Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation.ACM,2022:168-182.
[12]ZHONG V,XIONG C,SOCHER R.Seq2SQL:GeneratingStructured Queries from Natural Language using Reinforcement Learning[J].arXiv:1709.00103,2017.
[13]YU T,ZHANG R,YANG K,et al.Spider:A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Proces-sing.Association for Computational Linguistics,2018:3911-3921.
[14]XU X,LIU C,SONG D.SQLNet:Generating Structured Queries From Natural Language Without Reinforcement Learning[J].arXiv:1711.04436,2017.
[15]YU T,YASUNAGA M,YANG K,et al.SyntaxSQLNet:Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics,2018:1653-1663.
[16]LIN K,BOGIN B,NEUMANN M,et al.Grammar-based Neural Text-to-SQL Generation[J].arXiv:1905.13326,2019.
[17]GUO J,ZHAN Z,GAO Y,et al.Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Association for Computational Linguistics,2019:4524-4535.
[18]WANG B,SHIN R,LIU X,et al.RAT-SQL:Relation-AwareSchema Encoding and Linking for Text-to-SQL Parsers[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Association for Computational Linguistics,2020:7567-7578.
[19]KENTON J D M W C,TOUTANOVA L K.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of NAACL-HLT.Association for Computational Linguistics,2019:4171-4186.
[20]RAFFEL C,SHAZEER N,ROBERTS A,et al.Exploring the limits of transfer learning with a unified text-to-text transformer[J].The Journal of Machine Learning Research,2020,21(1):140:5485-140:5551.
[21]BROWN T,MANN B,RYDER N,et al.Language Models are Few-Shot Learners[C]//Advances in Neural Information Processing System.Curran Associates,Inc.,2020:1877-1901.
[22]CHEN M,TWOREK J,JUN H,et al.Evaluating Large Language Models Trained on Code[J].arXiv:2107.03374,2021.
[23]CHEUNG A,KAMIL S,SOLAR-LEZAMA A.Bridging theGap Between General-Purpose and Domain-Specific Compilers with Synthesis[C]//1st Summit on Advances in Programing Languages.2015:51-62.
[24]SCHOLAK T,SCHUCHER N,BAHDANAU D.PICARD:Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics,2021:9895-9901.
[25]SHAW P,CHANG M W,PASUPAT P,et al.CompositionalGeneralization and Natural Language Variation:Can a Semantic Parsing Approach Handle Both?[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing,Association for Computational Linguistics,2021:922-938.
[26]HE P,MAO Y,CHAKRABARTI K,et al.X-SQL:reinforceschema representation with context[J].arXiv:1908.08113,2019.
[27]RAJKUMAR N,LI R,BAHDANAU D.Evaluating the Text-to-SQL Capabilities of Large Language Models[J].arXiv:2204.00498,2022.
[28]POURREZA M,RAFIEI D.DIN-SQL:Decomposed In-Context Learning of Text-to-SQL with Self-Correction[J].arXiv:2304.11015,2023.
[29]UTAMA P,WEIR N,BINNIG C,et al.Voice-based data exploration:Chatting with your database[C]//Proceedings of the Workshop on Search-Oriented Conversational AI.2017.
[30]SHAH V,LI S,KUMAR A,et al.SpeakQL:Towards Speech-driven Multimodal Querying of Structured Data[C]//Procee-dings of the 2020 ACM SIGMOD International Conference on Management of Data.ACM,2020:2363-2374.
[31]SONG Y,WONG R C W,ZHAO X,et al.Speech-to-SQL:Towards Speech-driven SQL Query Generation From Natural Language Question[J].arXiv:2201.01209,2022.
[32]YU T,ZHANG R,YASUNAGA M,et al.SParC:Cross-Domain Semantic Parsing in Context[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Association for Computational Linguistics,2019:4511-4523.
[33]BERANT J,CHOU A,FROSTIG R,et al.Semantic Parsing on Freebase from Question-Answer Pairs[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics,2013:1533-1544.
[34]MIN Q,SHI Y,ZHANG Y.A Pilot Study for Chinese SQL Semantic Parsing[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing.Association for Computational Linguistics,2019:3652-3658.
[35]SUN N,YANG X,LIU Y.TableQA:a Large-Scale ChineseText-to-SQL Dataset for Table-Aware SQL Generation[J].arXiv:2006.06434,2020.
[1] ZHAO Yue, HE Jinwen, ZHU Shenchen, LI Congyi, ZHANG Yingjie, CHEN Kai. Security of Large Language Models:Current Status and Challenges [J]. Computer Science, 2024, 51(1): 68-71.
[2] WANG Tao, GUO Wushi, DENG Jian, CHEN Liang. Building Natural Language Interfaces for Distributed SCADA Systems Using Semantic Parsing [J]. Computer Science, 2023, 50(6A): 220300141-9.
[3] . New Approach for SQL-injecton Detection [J]. Computer Science, 2012, 39(Z6): 60-64.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!