计算机科学 ›› 2020, Vol. 47 ›› Issue (11A): 562-569.doi: 10.11896/jsjkx.200200086

• 软件工程&数据库 • 上一篇    下一篇

结合关系分类与修正的SQL语法结构构建

万文军1, 窦全胜1,2, 崔盼盼1, 张斌1, 唐焕玲1,2   

  1. 1 山东工商学院计算机科学与技术学院 山东 烟台 264000
    2 山东省高等学校未来智能计算协同创新中心 山东 烟台 264000
  • 出版日期:2020-11-15 发布日期:2020-11-17
  • 通讯作者: 窦全胜(douquansheng1@126.com)
  • 作者简介:wanwenjun131@163.com
  • 基金资助:
    国家自然科学基金(61976125,61976124,61772319,61773244);高校科技计划项目(J18KA340,J18KA385)

SQL Grammar Structure Construction Based on Relationship Classification and Correction

WAN Wen-jun1, DOU Quan-sheng1,2, CUI Pan-pan1, ZHANG Bin1, TANG Huan-ling1,2   

  1. 1 School of Computer Science and Technology,Shandong Technology and Business University,Yantai,Shandong 264000,China
    2 Co-innovation Center of Shandong Colleges and Universities:Future Intelligent Computing,Yantai,Shandong 264000,China
  • Online:2020-11-15 Published:2020-11-17
  • About author:WAN Wen-jun,born in 1996,postgra-duate.His main research interests include natural language processing and deep learning.
    DOU Quan-sheng,born in 1971,Ph.D,professor,is a member of China Computer Federation.His main research interests include natural language processing and deep learning.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61976125,61976124,61772319,61773244) and High Education Science and Technology Planning Program of Shandong Provincial Education Department (J18KA340,J18KA385).

摘要: 针对嵌套查询中SQL语法结构难以构建的问题,提出结合关系分类与修正的GSC-RCC方法,以3类实体间关系表示SQL语法。首先设计关系分类深度模型,并引入列名常用词提升模型性能,用以确定语句中每个实体对所属不同关系的概率,以此生成无修正无向图;然后设计基于SQL语法的关系修正算法,对无向图进行修正,以此构建SQL语法结构。在房产数据查询任务中,GSC-RCC对多条件含嵌套复杂查询的语法结构生成准确率为92.25%,且可减轻模型对语句样本数的依赖。

关键词: NL2SQL, SQL语法结构, 关系分类, 关系修正, 深度学习

Abstract: Aiming at the problem that the SQL grammar structure in nested query is difficult to construct,the GSC-RCC method combining relation classification and modification is proposed,and the SQL grammar is represented by three types of entity relationships.Firstly,the relational classification depth model is designed,and the column name common words are introduced to improve the performance of the model,so as to determine the probability of different relations of each entity pair in the statement,and then generate unmodified undirected graph.Then the relationship correction algorithm based on SQL grammar is designed to modify the undirected graph and finally construct the SQL grammar structure.In the real estate data query task,for multi-conditional query statements with nested conditions,the grammar structure generation accuracy of GSC-RCC method is 92.25%,and the method can reduce the dependence of the model on the number of statement sample.

Key words: Deep learning, NL2SQL, Relationship classification, Relationship correction, SQL grammar structure

中图分类号: 

  • TP312
[1] CHUNG J,GULCEHRE C,CHO K H,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[J].arXiv:1412.3555,2014.
[2] POPESCU A M,ARMANASU A,ETZIONI O,et al.Modern natural language interfaces to databases:Composing statistical parsing with semantic tractability[C]//Proc of the 20th Int Conf on Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2004:141-147.
[3] LI Y,YANG H,JAGADISH H V.Constructing a Generic Natural Language Interface for an XML Database[M].Berlin,Heidelberg:Springer Berlin Heidelberg,2006:737-754.
[4] ALESSANDRA G,MOSCHITTI A.Translating Questions to SQL Queries with Generative Parsers Discriminatively Reranked [C]//Proc. of COLING.New York:ACM,2012:401-410.
[5] POON H.Grounded Unsupervised Semantic Parsing [C]//Proc of the 51st Annual Meeting of the Association forComputatio-nal Linguistics.Stroudsburg,PA:ACL,2013.
[6] LI H,XU J.Semantic Matching in Search [J].Foundations and Trends in Information Retrieval,2014,7(5):343-469.
[7] SERBAN I,SORDONI A,BENGIO Y,et al.Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models [C]//Proc of the 30th AAAI Conf on Artificial Intelligence.Menlo Park,CA:AAAI Press,2016.
[8] DOLAN W B,BROCKETT C.Automatically Constructing aCorpus of Sentential Paraphrases [C]//Proc of the Third Int Workshop on Paraphrasing(IWP).2005.
[9] BROWN P F,DELLA PIETRA S A,DELLA PIETRA V J,et al.The Mathematics of Statistical Machine Translation:Parameter Estimation [J].Computational Linguistics,1993,19(2):263.
[10] VINYALS O,KAISER L,KOO T K,et al.Grammar as a Foreign Language [C]//Proc of Advances in Neural Information Processing Systems.New York,NY:Curran Associates,2015:2773-2781.
[11] ZHONG V,XIONG C,SOCHER R.Seq2SQL:GeneratingStructured Queries from Natural Language using Reinforcement Learning [J].arXiv:1709.00103,2017.
[12] DONG L,LAPATA M.Coarse-to-Fine Decoding for Neural Semantic Parsing [J].arXiv:1805.04793,2018.
[13] CAI R C,XU B Y,ZHANG Z J,et al.An Encoder-Decoder Framework Translating Natural Language to Database Queries [J].arXiv:1711.06061,2017.
[14] XU X J,LIU C,DAWN S.SQLNet:Generating Structured Queries From Natural Language Without Reinforcement Learning [J].arXiv:1711.04436,2017.
[15] YU T,LI Z F,ZHANG Z L,et al.TypeSQL:Knowledge-based Type-Aware Neural Text-to-SQL Generation [J].arXiv:1804.09769,2018.
[16] HWANG W,YIM J,PARK S,et al.A Comprehensive Exploration on WikiSQL with Table-Aware Word Contextualization [J].arXiv:1902.01069,2019.
[17] EPSTEIN,SAMUEL S.Transportable Natural Language Pro-cessing through Simplicity-The PRE System [J].Acm Transactions on Information Systems,1985,3(2):107-120.
[18] IYER S,KONSTAS I,CHEUNG A,et al.Learning a Neural Semantic Parser from User Feedback[J].arXiv:1704.08760,2017.
[19] FINEGAN-DOLLAK C,KUMMERFELD J K,ZHANG L,et al.Improving Text-to-SQL Evaluation Methodology [J].ar-Xiv:1806.09029,2018.
[20] LEE D,YOON J,SONG J,et al.One-Shot Learning for Text-to-SQL Generation [J].arXiv:1905.11499,2019.
[21] VINYALS O,FORTUNATO M,JAITLY N.Pointer Networks [C]//Proc of Advances in Neural Information Processing Systems.New York,NY:Curran Associates,2015:2692-2700.
[22] LEE D.Recursive and Clause-Wise Decoding for Complex and Cross-Domain Text-to-SQL Generation [J].arXiv:1904.08835,2019.
[23] SONG Y,SHI S,LI J,et al.Directional Skip-Gram:ExplicitlyDistinguishing Left and Right Context for Word Embeddings [C]//Proc of the 2018 Conf of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Stroudsburg,PA:ACL,2018:175-180.
[24] LIN Z H,FENG M W,SANTOS C N D,et al.A Structured Self-attentive Sentence Embedding [J].arXiv:1703.03130,2017.
[25] CORTES C,VAPNIK V.Support-Vector Networks [J].Machine Learning,1995,20(3):273-297.
[26] ZENG D J,LIU K,LAI S W,et al.Relation Classification via Convolutional Deep Neural Network [C]//Proc. of COLING.New York:ACM,2014:2335-2344.
[27] ZHU J,QIAO J,DAI X,et al.Relation Classification via Target-Concentrated Attention CNNs [C]//Int. Conf. on Neural Information Processing.Berlin:Springer,2017:137-146.
[1] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[4] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[5] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[6] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[7] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[8] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[9] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[10] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[11] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[12] 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫.
小样本雷达辐射源识别的深度学习方法综述
Survey of Deep Learning for Radar Emitter Identification Based on Small Sample
计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[13] 刘伟业, 鲁慧民, 李玉鹏, 马宁.
指静脉识别技术研究综述
Survey on Finger Vein Recognition Research
计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056
[14] 孙福权, 崔志清, 邹彭, 张琨.
基于多尺度特征的脑肿瘤分割算法
Brain Tumor Segmentation Algorithm Based on Multi-scale Features
计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
[15] 康雁, 徐玉龙, 寇勇奇, 谢思宇, 杨学昆, 李浩.
基于Transformer和LSTM的药物相互作用预测
Drug-Drug Interaction Prediction Based on Transformer and LSTM
计算机科学, 2022, 49(6A): 17-21. https://doi.org/10.11896/jsjkx.210400150
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!