计算机科学 ›› 2020, Vol. 47 ›› Issue (9): 60-66.doi: 10.11896/jsjkx.190800138
田野1, 寿黎但1,2, 陈珂1,2, 骆歆远1,2, 陈刚1,2
TIAN Ye1, SHOU Li-dan1,2, CHEN Ke1,2, LUO Xin-yuan1,2, CHEN Gang1,2
摘要: 将自然语言转化成数据库可以执行的查询语句,是目前智能交互和人机对话系统的核心难题,也是新型供电列车大数据运用支撑平台对接应用平台及建立城轨列车个性化运维系统的难点。现有的基于神经网络的方法没有充分利用数据表的丰富信息,影响了查询的准确率。针对数据表内容作为输入的情况下,如何提升自然语言查询接口的查询准确率的问题,文中创新地提出了基于数据表内容的字段嵌入方法,利用数据表中每个字段存储的内容对字段进行嵌入表示,并据此提出了新的模型嵌入层结构;此外,提出了一种基于数据表内容的数据增强方法,通过用数据表相同字段中的其他记录去代替查询语句中的属性值,来产生新的训练样本。最后,针对提出的字段嵌入表示和数据增强方法,在WikiSQL数据集上进行了对比实验。实验结果显示,相比当前效果最好的模型,单独使用这两种方法时能够提升0.6%~0.8%的查询准确率,共同使用时则能够提升接近1%的查询准确率,证明所提字段嵌入和数据增强方法对查询准确率有一定的提升作用。
中图分类号:
[1] DONG L,LAPATA M.Language to Logical Form with Neural Attention[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.2016:33-43. [2] ZHONG V,XIONG C,SOCHER R.Seq2sql:Generating structured queries from natural language using reinforcement learning[J].arXiv:1709.00103,2017. [3] XU X,LIU C,SONG D.Sqlnet:Generating structured queries from natural language without reinforcement learning[J].arXiv:1711.04436,2017. [4] YU T,LI Z,ZHANG Z,et al.TypeSQL:Knowledge-BasedType-Aware Neural Text-to-SQL Generation[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2018:588-594. [5] GUO J,ZHAN Z,GAO Y,et al.Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation[J].arXiv:1905.08205,2019. [6] HWANG W,YIM J,PARK S,et al.A Comprehensive Exploration on WikiSQL with Table-Aware Word Contextualization[J].arXiv:1902.01069,2019. [7] ANDROUTSOPOULOS I,RITCHIE G D,THANISCH P.Na-tural language interfaces to databases-an introduction[J].Natural Language Engineering,1995,1(1):29-81. [8] DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training ofdeep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018. [9] PETROVSKI B,AGUADO I,HOSSMANN A,et al.EmbeddingIndividual Table Columns for Resilient SQL Chatbots[J].EMN-.LP 2018,2018:67. [10] SUN Y,TANG D,DUAN N,et al.Semantic Parsing with Syntax-and Table-Aware SQL Generation[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.2018:361-372. [11] YAVUZ S,GUR I,SU Y,et al.What It Takes to Achieve 100% Condition Accuracy on WikiSQL[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Proces-sing.2018:1702-1711. [12] VINYALS O,FORTUNATO M,JAITLY N.Pointer networks[C]//Advances in Neural Information Processing Systems.2015:2692-2700. [13] PASZKE A,GROSS S,MASSA F,et al.Pytorch:An imperative style,high-performance deep learning library[C]//Advances in Neural Information Processing Systems.2019:8026-8037. [14] PENNINGTON J,SOCHER R,MANNING C.Glove:Globalvectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).2014:1532-1543. [15] WIETING J,GIMPEL K.Paranmt-50m:Pushing the limits of paraphrastic sentence embeddings with millions of machine translations[J].arXiv:1711.05732,2017. [16] KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014. |
[1] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[2] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[3] | 李小伟, 舒辉, 光焱, 翟懿, 杨资集. 自然语言处理在简历分析中的应用研究综述 Survey of the Application of Natural Language Processing for Resume Analysis 计算机科学, 2022, 49(6A): 66-73. https://doi.org/10.11896/jsjkx.210600134 |
[4] | 曹合心, 赵亮, 李雪峰. 图神经网络在Text-to-SQL解析中的技术研究 Technical Research of Graph Neural Network for Text-to-SQL Parsing 计算机科学, 2022, 49(4): 110-115. https://doi.org/10.11896/jsjkx.210200173 |
[5] | 李玉强, 张伟江, 黄瑜, 李琳, 刘爱华. 基于高斯分布的改进词嵌入主题情感模型 Improved Topic Sentiment Model with Word Embedding Based on Gaussian Distribution 计算机科学, 2022, 49(2): 256-264. https://doi.org/10.11896/jsjkx.201200082 |
[6] | 张虎, 柏萍. 融入句子中远距离词语依赖的图卷积短文本分类方法 Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification 计算机科学, 2022, 49(2): 279-284. https://doi.org/10.11896/jsjkx.201200062 |
[7] | 李昭奇, 黎塔. 基于wav2vec预训练的样例关键词识别 Query-by-Example with Acoustic Word Embeddings Using wav2vec Pretraining 计算机科学, 2022, 49(1): 59-64. https://doi.org/10.11896/jsjkx.210900007 |
[8] | 陈志毅, 隋杰. 基于DeepFM和卷积神经网络的集成式多模态谣言检测方法 DeepFM and Convolutional Neural Networks Ensembles for Multimodal Rumor Detection 计算机科学, 2022, 49(1): 101-107. https://doi.org/10.11896/jsjkx.201200007 |
[9] | 王立梅, 朱旭光, 汪德嘉, 张勇, 邢春晓. 基于深度学习的民事案件判决结果分类方法研究 Study on Judicial Data Classification Method Based on Natural Language Processing Technologies 计算机科学, 2021, 48(8): 80-85. https://doi.org/10.11896/jsjkx.210300130 |
[10] | 程希, 曹晓梅. 基于信息携带的SQL注入攻击检测方法 SQL Injection Attack Detection Method Based on Information Carrying 计算机科学, 2021, 48(7): 70-76. https://doi.org/10.11896/jsjkx.200600010 |
[11] | 裴莹, 李天祥, 王鏖清, 付加胜, 韩霄松. 基于新闻的国际天然气价格趋势预测方法 Prediction Method of International Natural Gas Price Trends Based on News 计算机科学, 2021, 48(6A): 235-239. https://doi.org/10.11896/jsjkx.201000056 |
[12] | 刘立成, 徐一凡, 谢贵才, 段磊. 面向NoSQL数据库的JSON文档异常检测与语义消歧模型 Outlier Detection and Semantic Disambiguation of JSON Document for NoSQL Database 计算机科学, 2021, 48(2): 93-99. https://doi.org/10.11896/jsjkx.200900039 |
[13] | 吴俣, 李舟军. 检索式聊天机器人技术综述 Survey on Retrieval-based Chatbots 计算机科学, 2021, 48(12): 278-285. https://doi.org/10.11896/jsjkx.210900250 |
[14] | 鲁佳文, 严丽. 对象关系数据库到RDF(S)的映射方法 Mapping Method from Object-relational Database to RDF(S) 计算机科学, 2021, 48(10): 145-151. https://doi.org/10.11896/jsjkx.200800006 |
[15] | 仝鑫, 王斌君, 王润正, 潘孝勤. 面向自然语言处理的深度学习对抗样本综述 Survey on Adversarial Sample of Deep Learning Towards Natural Language Processing 计算机科学, 2021, 48(1): 258-267. https://doi.org/10.11896/jsjkx.200500078 |
|