计算机科学 ›› 2024, Vol. 51 ›› Issue (7): 40-48.doi: 10.11896/jsjkx.231000143
刘雨蒙1,2, 赵怡婧1,2, 王碧聪1, 王潮1, 张宝民1
LIU Yumeng1,2, ZHAO Yijing1,2, WANG Bicong1, WANG Chao1, ZHANG Baomin1
摘要: 近年来,随着大数据、云计算等技术的飞速发展,大规模数据的产生使得各类应用对于数据库技术的依赖日益加深。然而,传统的数据库一般采用形式化的数据库查询语言SQL进行操作,对无编程经验或数据库使用经验的用户来说,复杂SQL语法难度较高,降低了各个领域数据库应用者的便捷程度。近年来,机器学习、深度神经网络等人工智能技术的飞速发展,尤其是ChatGPT横空出世引发的大语言模型技术热潮,驱动了数据库与人工智能的深度结合与技术变革。通过智能方法将用户输入语言自动化合成SQL语言,以满足不同程度数据库使用者的操作需求,提升数据库的智能性、环境适应性及用户友好性。为全面聚焦数据库查询语言智能合成技术的最新研究进展,从范例输入、文本输入及语音输入这3类用户输入切入,详细阐述各类智能合成模型的研究脉络、代表性工作及最新进展,同时对各类方法的技术框架进行归纳与对比,最后对全文进行全面性的总结,并针对现有方法存在的问题和挑战展望未来发展方向。
中图分类号:
[1]WOODS W A.Progress in natural language understanding:an application to lunar geology[C]//National Computer Conference and Exposition.Association for Computing Machinery,1973:441-450. [2]CODD E F.Seven Steps to Rendezvous with the Casual User[C]//IFIP TC-2 Working Conference Data Base Management Systems.1974. [3]SACERDOTI E D.Language Access to Distributed Data withError Recovery[C]//International Joint Conference on Artificial Intelligence.1977:196-202. [4]WARREN D H D,PEREIRA F C.An Efficient Easily Adaptable System for Interpreting Natural Language Queries[J].American Journal of Computational Linguistics,1982,8:110-122. [5]ZHANG S,SUN Y.Automatically synthesizing SQL queriesfrom input-output examples[C]//2013 IEEE/ACM 28th International Conference on Automated Software Engineering(ASE).IEEE,2013:224-234. [6]LI H,CHAN C Y,MAIER D.Query from examples:an iterative,data-driven approach to query construction[J].Proceedings of the VLDB Endowment,2015,8(13):2158-2169. [7]WANG C,CHEUNG A,BODIK R.Synthesizing highly expressive SQL queries from input-output examples[C]//ACM SIGPLAN Conference on Programming Language Design and Implementation.ACM,2017:452-466. [8]THAKKAR A,NAIK A,SANDS N,et al.Example-GuidedSynthesis of Relational Queries[C]//Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation.Association for Computing Machinery,2021:1110-1125. [9]LAW M,RUSSO A,BRODA K.Inductive Learning of Answer Set Programs[C]//European Workshop on Logics in Artificial Intelligence.2014:311-325. [10]RAGHOTHAMAN M,MENDELSON J,ZHAO D,et al.Prov-enance-Guided Synthesis of Datalog Programs[J].Proceedings of the ACM on Programming Languages,2019,4(POPL):1-27. [11]ZHOU X,BODIK R,CHEUNG A,et al.Synthesizing analytical SQL queries from computation demonstration[C]//Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation.ACM,2022:168-182. [12]ZHONG V,XIONG C,SOCHER R.Seq2SQL:GeneratingStructured Queries from Natural Language using Reinforcement Learning[J].arXiv:1709.00103,2017. [13]YU T,ZHANG R,YANG K,et al.Spider:A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Proces-sing.Association for Computational Linguistics,2018:3911-3921. [14]XU X,LIU C,SONG D.SQLNet:Generating Structured Queries From Natural Language Without Reinforcement Learning[J].arXiv:1711.04436,2017. [15]YU T,YASUNAGA M,YANG K,et al.SyntaxSQLNet:Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics,2018:1653-1663. [16]LIN K,BOGIN B,NEUMANN M,et al.Grammar-based Neural Text-to-SQL Generation[J].arXiv:1905.13326,2019. [17]GUO J,ZHAN Z,GAO Y,et al.Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Association for Computational Linguistics,2019:4524-4535. [18]WANG B,SHIN R,LIU X,et al.RAT-SQL:Relation-AwareSchema Encoding and Linking for Text-to-SQL Parsers[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Association for Computational Linguistics,2020:7567-7578. [19]KENTON J D M W C,TOUTANOVA L K.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of NAACL-HLT.Association for Computational Linguistics,2019:4171-4186. [20]RAFFEL C,SHAZEER N,ROBERTS A,et al.Exploring the limits of transfer learning with a unified text-to-text transformer[J].The Journal of Machine Learning Research,2020,21(1):140:5485-140:5551. [21]BROWN T,MANN B,RYDER N,et al.Language Models are Few-Shot Learners[C]//Advances in Neural Information Processing System.Curran Associates,Inc.,2020:1877-1901. [22]CHEN M,TWOREK J,JUN H,et al.Evaluating Large Language Models Trained on Code[J].arXiv:2107.03374,2021. [23]CHEUNG A,KAMIL S,SOLAR-LEZAMA A.Bridging theGap Between General-Purpose and Domain-Specific Compilers with Synthesis[C]//1st Summit on Advances in Programing Languages.2015:51-62. [24]SCHOLAK T,SCHUCHER N,BAHDANAU D.PICARD:Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics,2021:9895-9901. [25]SHAW P,CHANG M W,PASUPAT P,et al.CompositionalGeneralization and Natural Language Variation:Can a Semantic Parsing Approach Handle Both?[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing,Association for Computational Linguistics,2021:922-938. [26]HE P,MAO Y,CHAKRABARTI K,et al.X-SQL:reinforceschema representation with context[J].arXiv:1908.08113,2019. [27]RAJKUMAR N,LI R,BAHDANAU D.Evaluating the Text-to-SQL Capabilities of Large Language Models[J].arXiv:2204.00498,2022. [28]POURREZA M,RAFIEI D.DIN-SQL:Decomposed In-Context Learning of Text-to-SQL with Self-Correction[J].arXiv:2304.11015,2023. [29]UTAMA P,WEIR N,BINNIG C,et al.Voice-based data exploration:Chatting with your database[C]//Proceedings of the Workshop on Search-Oriented Conversational AI.2017. [30]SHAH V,LI S,KUMAR A,et al.SpeakQL:Towards Speech-driven Multimodal Querying of Structured Data[C]//Procee-dings of the 2020 ACM SIGMOD International Conference on Management of Data.ACM,2020:2363-2374. [31]SONG Y,WONG R C W,ZHAO X,et al.Speech-to-SQL:Towards Speech-driven SQL Query Generation From Natural Language Question[J].arXiv:2201.01209,2022. [32]YU T,ZHANG R,YASUNAGA M,et al.SParC:Cross-Domain Semantic Parsing in Context[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Association for Computational Linguistics,2019:4511-4523. [33]BERANT J,CHOU A,FROSTIG R,et al.Semantic Parsing on Freebase from Question-Answer Pairs[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics,2013:1533-1544. [34]MIN Q,SHI Y,ZHANG Y.A Pilot Study for Chinese SQL Semantic Parsing[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing.Association for Computational Linguistics,2019:3652-3658. [35]SUN N,YANG X,LIU Y.TableQA:a Large-Scale ChineseText-to-SQL Dataset for Table-Aware SQL Generation[J].arXiv:2006.06434,2020. |
|