Computer Science ›› 2020, Vol. 47 ›› Issue (9): 60-66.doi: 10.11896/jsjkx.190800138

• Database & Big Data & Data Science • Previous Articles     Next Articles

Natural Language Interface for Databases with Content-based Table Column Embeddings

TIAN Ye1, SHOU Li-dan1,2, CHEN Ke1,2, LUO Xin-yuan1,2, CHEN Gang1,2   

  1. 1 College of Computer Science and Technology,Zhejiang University,Hangzhou 310027,China
    2 Key Laboratory of Big Data Intelligent Computing of Zhejiang Province,Hangzhou 310027,China
  • Received:2019-08-28 Published:2020-09-10
  • About author:TIAN Ye,born in 1996,postgraduate.His main research interests include knowledge graph and natural language processing.
    CHEN Ke,born in 1977,Ph.D,associate professor.Her main research interests include spatial temporal data management,Web data mining and data privacy protection,etc.
  • Supported by:
    National Key R&D Program of China (2017YFB1201001),National Natural Science Foundation of China (61672455) and Natural Science Foundation of Zhejiang Province,China (LY18F020005).

Abstract: Converting natural language into query statements that can be executed in database is the core problem of intelligent interaction and human-computer dialogue system,and is also the urgent need of personalized operation and maintenance system for urban rail trains.At the same time,it is the difficulty of docking the bottom application platform with the support platform for large data application of the new power supply train.The existing neural network-based methods don’t utilizing semantic-rich table content or utilize it partially,which limits the improvement of the execution accuracy.This paper studies how to improve the query accuracy of natural language query interfaces when table content is included in the inputs.Aiming at this problem,this paper proposes a table column embedding method based on table content which embeds the table columns by utilizing the content stored in each table column.Based on the method,this paper proposes a new structure of embedding layer.This paper also proposes a method of data augmentation by utilize table content.It generates new training samples by replacing attribute values in queries with other records in the same column of the table.This paper finally conducts experiments on WikiSQL dataset for the proposed methods of column embedding and data augmentation.The experimental results show that,on the basis of the state-of-the-art methods,the two methods can improve the query accuracy by 0.6%~0.8% when they are used separately and nearly 1% when they are used together.Therefore,it proves that the methods of column embedding and data augmentation proposed in this paper can achieve good improvements on execution accuracy.

Key words: Database query, Natural language processing, SQL, Word embedding

CLC Number: 

  • TP391.1
[1] DONG L,LAPATA M.Language to Logical Form with Neural Attention[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.2016:33-43.
[2] ZHONG V,XIONG C,SOCHER R.Seq2sql:Generating structured queries from natural language using reinforcement learning[J].arXiv:1709.00103,2017.
[3] XU X,LIU C,SONG D.Sqlnet:Generating structured queries from natural language without reinforcement learning[J].arXiv:1711.04436,2017.
[4] YU T,LI Z,ZHANG Z,et al.TypeSQL:Knowledge-BasedType-Aware Neural Text-to-SQL Generation[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2018:588-594.
[5] GUO J,ZHAN Z,GAO Y,et al.Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation[J].arXiv:1905.08205,2019.
[6] HWANG W,YIM J,PARK S,et al.A Comprehensive Exploration on WikiSQL with Table-Aware Word Contextualization[J].arXiv:1902.01069,2019.
[7] ANDROUTSOPOULOS I,RITCHIE G D,THANISCH P.Na-tural language interfaces to databases-an introduction[J].Natural Language Engineering,1995,1(1):29-81.
[8] DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training ofdeep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[9] PETROVSKI B,AGUADO I,HOSSMANN A,et al.EmbeddingIndividual Table Columns for Resilient SQL Chatbots[J].EMN-.LP 2018,2018:67.
[10] SUN Y,TANG D,DUAN N,et al.Semantic Parsing with Syntax-and Table-Aware SQL Generation[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.2018:361-372.
[11] YAVUZ S,GUR I,SU Y,et al.What It Takes to Achieve 100% Condition Accuracy on WikiSQL[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Proces-sing.2018:1702-1711.
[12] VINYALS O,FORTUNATO M,JAITLY N.Pointer networks[C]//Advances in Neural Information Processing Systems.2015:2692-2700.
[13] PASZKE A,GROSS S,MASSA F,et al.Pytorch:An imperative style,high-performance deep learning library[C]//Advances in Neural Information Processing Systems.2019:8026-8037.
[14] PENNINGTON J,SOCHER R,MANNING C.Glove:Globalvectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).2014:1532-1543.
[15] WIETING J,GIMPEL K.Paranmt-50m:Pushing the limits of paraphrastic sentence embeddings with millions of machine translations[J].arXiv:1711.05732,2017.
[16] KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014.
[1] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[2] HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[3] LI Xiao-wei, SHU Hui, GUANG Yan, ZHAI Yi, YANG Zi-ji. Survey of the Application of Natural Language Processing for Resume Analysis [J]. Computer Science, 2022, 49(6A): 66-73.
[4] HAN Hong-qi, RAN Ya-xin, ZHANG Yun-liang, GUI Jie, GAO Xiong, YI Meng-lin. Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning [J]. Computer Science, 2022, 49(5): 33-42.
[5] CAO He-xin, ZHAO Liang, LI Xue-feng. Technical Research of Graph Neural Network for Text-to-SQL Parsing [J]. Computer Science, 2022, 49(4): 110-115.
[6] LI Yu-qiang, ZHANG Wei-jiang, HUANG Yu, LI Lin, LIU Ai-hua. Improved Topic Sentiment Model with Word Embedding Based on Gaussian Distribution [J]. Computer Science, 2022, 49(2): 256-264.
[7] ZHANG Hu, BAI Ping. Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification [J]. Computer Science, 2022, 49(2): 279-284.
[8] LIU Kai, ZHANG Hong-jun, CHEN Fei-qiong. Name Entity Recognition for Military Based on Domain Adaptive Embedding [J]. Computer Science, 2022, 49(1): 292-297.
[9] LI Zhao-qi, LI Ta. Query-by-Example with Acoustic Word Embeddings Using wav2vec Pretraining [J]. Computer Science, 2022, 49(1): 59-64.
[10] CHEN Zhi-yi, SUI Jie. DeepFM and Convolutional Neural Networks Ensembles for Multimodal Rumor Detection [J]. Computer Science, 2022, 49(1): 101-107.
[11] WANG Li-mei, ZHU Xu-guang, WANG De-jia, ZHANG Yong, XING Chun-xiao. Study on Judicial Data Classification Method Based on Natural Language Processing Technologies [J]. Computer Science, 2021, 48(8): 80-85.
[12] CHENG Xi, CAO Xiao-mei. SQL Injection Attack Detection Method Based on Information Carrying [J]. Computer Science, 2021, 48(7): 70-76.
[13] YU Sheng, LI Bin, SUN Xiao-bing, BO Li-li, ZHOU Cheng. Approach for Knowledge-driven Similar Bug Report Recommendation [J]. Computer Science, 2021, 48(5): 91-98.
[14] LIU Li-cheng, XU Yi-fan, XIE Gui-cai, DUAN Lei. Outlier Detection and Semantic Disambiguation of JSON Document for NoSQL Database [J]. Computer Science, 2021, 48(2): 93-99.
[15] WU Yu, LI Zhou-jun. Survey on Retrieval-based Chatbots [J]. Computer Science, 2021, 48(12): 278-285.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!