Computer Science ›› 2020, Vol. 47 ›› Issue (9): 60-66.doi: 10.11896/jsjkx.190800138

• Database & Big Data & Data Science • Previous Articles     Next Articles

Natural Language Interface for Databases with Content-based Table Column Embeddings

TIAN Ye1, SHOU Li-dan1,2, CHEN Ke1,2, LUO Xin-yuan1,2, CHEN Gang1,2   

  1. 1 College of Computer Science and Technology,Zhejiang University,Hangzhou 310027,China
    2 Key Laboratory of Big Data Intelligent Computing of Zhejiang Province,Hangzhou 310027,China
  • Received:2019-08-28 Published:2020-09-10
  • About author:TIAN Ye,born in 1996,postgraduate.His main research interests include knowledge graph and natural language processing.
    CHEN Ke,born in 1977,Ph.D,associate professor.Her main research interests include spatial temporal data management,Web data mining and data privacy protection,etc.
  • Supported by:
    National Key R&D Program of China (2017YFB1201001),National Natural Science Foundation of China (61672455) and Natural Science Foundation of Zhejiang Province,China (LY18F020005).

Abstract: Converting natural language into query statements that can be executed in database is the core problem of intelligent interaction and human-computer dialogue system,and is also the urgent need of personalized operation and maintenance system for urban rail trains.At the same time,it is the difficulty of docking the bottom application platform with the support platform for large data application of the new power supply train.The existing neural network-based methods don’t utilizing semantic-rich table content or utilize it partially,which limits the improvement of the execution accuracy.This paper studies how to improve the query accuracy of natural language query interfaces when table content is included in the inputs.Aiming at this problem,this paper proposes a table column embedding method based on table content which embeds the table columns by utilizing the content stored in each table column.Based on the method,this paper proposes a new structure of embedding layer.This paper also proposes a method of data augmentation by utilize table content.It generates new training samples by replacing attribute values in queries with other records in the same column of the table.This paper finally conducts experiments on WikiSQL dataset for the proposed methods of column embedding and data augmentation.The experimental results show that,on the basis of the state-of-the-art methods,the two methods can improve the query accuracy by 0.6%~0.8% when they are used separately and nearly 1% when they are used together.Therefore,it proves that the methods of column embedding and data augmentation proposed in this paper can achieve good improvements on execution accuracy.

Key words: Database query, Natural language processing, SQL, Word embedding

CLC Number: 

  • TP391.1
[1] DONG L,LAPATA M.Language to Logical Form with Neural Attention[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.2016:33-43.
[2] ZHONG V,XIONG C,SOCHER R.Seq2sql:Generating structured queries from natural language using reinforcement learning[J].arXiv:1709.00103,2017.
[3] XU X,LIU C,SONG D.Sqlnet:Generating structured queries from natural language without reinforcement learning[J].arXiv:1711.04436,2017.
[4] YU T,LI Z,ZHANG Z,et al.TypeSQL:Knowledge-BasedType-Aware Neural Text-to-SQL Generation[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2018:588-594.
[5] GUO J,ZHAN Z,GAO Y,et al.Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation[J].arXiv:1905.08205,2019.
[6] HWANG W,YIM J,PARK S,et al.A Comprehensive Exploration on WikiSQL with Table-Aware Word Contextualization[J].arXiv:1902.01069,2019.
[7] ANDROUTSOPOULOS I,RITCHIE G D,THANISCH P.Na-tural language interfaces to databases-an introduction[J].Natural Language Engineering,1995,1(1):29-81.
[8] DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training ofdeep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[9] PETROVSKI B,AGUADO I,HOSSMANN A,et al.EmbeddingIndividual Table Columns for Resilient SQL Chatbots[J].EMN-.LP 2018,2018:67.
[10] SUN Y,TANG D,DUAN N,et al.Semantic Parsing with Syntax-and Table-Aware SQL Generation[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.2018:361-372.
[11] YAVUZ S,GUR I,SU Y,et al.What It Takes to Achieve 100% Condition Accuracy on WikiSQL[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Proces-sing.2018:1702-1711.
[12] VINYALS O,FORTUNATO M,JAITLY N.Pointer networks[C]//Advances in Neural Information Processing Systems.2015:2692-2700.
[13] PASZKE A,GROSS S,MASSA F,et al.Pytorch:An imperative style,high-performance deep learning library[C]//Advances in Neural Information Processing Systems.2019:8026-8037.
[14] PENNINGTON J,SOCHER R,MANNING C.Glove:Globalvectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).2014:1532-1543.
[15] WIETING J,GIMPEL K.Paranmt-50m:Pushing the limits of paraphrastic sentence embeddings with millions of machine translations[J].arXiv:1711.05732,2017.
[16] KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014.
[1] ZHANG Yu-shuai, ZHAO Huan, LI Bo. Semantic Slot Filling Based on BERT and BiLSTM [J]. Computer Science, 2021, 48(1): 247-252.
[2] TONG Xin, WANG Bin-jun, WANG Run-zheng, PAN Xiao-qin. Survey on Adversarial Sample of Deep Learning Towards Natural Language Processing [J]. Computer Science, 2021, 48(1): 258-267.
[3] LU Long-long, CHEN Tong, PAN Min-xue, ZHANG Tian. CodeSearcher:Code Query Using Functional Descriptions in Natural Languages [J]. Computer Science, 2020, 47(9): 1-9.
[4] CHENG Jing, LIU Na-na, MIN Ke-rui, KANG Yu, WANG Xin, ZHOU Yang-fan. Word Embedding Optimization for Low-frequency Words with Applications in Short-text Classification [J]. Computer Science, 2020, 47(8): 255-260.
[5] ZHANG Ying, ZHANG Yi-fei, WANG Zhong-qing and WANG Hong-ling. Automatic Summarization Method Based on Primary and Secondary Relation Feature [J]. Computer Science, 2020, 47(6A): 6-11.
[6] ZHU Jun-wen. SQL InJection Recognition Based on Improved BP Neural Network [J]. Computer Science, 2020, 47(6A): 352-359.
[7] ZHANG Hao-yang and ZHOU Liang. Application of Improved GHSOM Algorithm in Civil Aviation Regulation Knowledge Map Construction [J]. Computer Science, 2020, 47(6A): 429-435.
[8] WU Xiao-kun, ZHAO Tian-fang. Application of Natural Language Processing in Social Communication:A Review and Future Perspectives [J]. Computer Science, 2020, 47(6): 184-193.
[9] HU Chao-wen, YANG Ya-lian, WU Chang-xing. Survey of Implicit Discourse Relation Recognition Based on Deep Learning [J]. Computer Science, 2020, 47(4): 157-163.
[10] YU Shan-shan, SU Jin-dian, LI Peng-fei. Sentiment Classification Method for Sentences via Self-attention [J]. Computer Science, 2020, 47(4): 204-210.
[11] LI Zhou-jun,FAN Yu,WU Xian-jie. Survey of Natural Language Processing Pre-training Techniques [J]. Computer Science, 2020, 47(3): 162-173.
[12] GU Xue-mei,LIU Jia-yong,CHENG Peng-sen,HE Xiang. Malware Name Recognition in Tweets Based on Enhanced BiLSTM-CRF Model [J]. Computer Science, 2020, 47(2): 245-250.
[13] MIAO Yi, ZHAO Zeng-shun, YANG Yu-lu, XU Ning, YANG Hao-ran, SUN Qian. Survey of Image Captioning Methods [J]. Computer Science, 2020, 47(12): 149-160.
[14] HUO Dan, ZHANG Sheng-jie, WAN Lu-jun. Context-based Emotional Word Vector Hybrid Model [J]. Computer Science, 2020, 47(11A): 28-34.
[15] WAN Wen-jun, DOU Quan-sheng, CUI Pan-pan, ZHANG Bin, TANG Huan-ling. SQL Grammar Structure Construction Based on Relationship Classification and Correction [J]. Computer Science, 2020, 47(11A): 562-569.
Full text



[1] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[2] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[3] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[4] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[5] YANG Yu-qi, ZHANG Guo-an and JIN Xi-long. Dual-cluster-head Routing Protocol Based on Vehicle Density in VANETs[J]. Computer Science, 2018, 45(4): 126 -130 .
[6] SHI Chao, XIE Zai-peng, LIU Han and LV Xin. Optimization of Container Deployment Strategy Based on Stable Matching[J]. Computer Science, 2018, 45(4): 131 -136 .
[7] HAN Kui-kui, XIE Zai-peng and LV Xin. Fog Computing Task Scheduling Strategy Based on Improved Genetic Algorithm[J]. Computer Science, 2018, 45(4): 137 -142 .
[8] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics[J]. Computer Science, 2018, 45(4): 169 -172 .
[9] GUO Jun-xia, GUO Ren-fei, XU Nan-shan and ZHAO Rui-lian. Study on Construction of EFSM Model for Web Application Based on Session[J]. Computer Science, 2018, 45(4): 203 -207 .
[10] DING Shu-yang, LI Bing and SHI Hong-bo. Study on Flexible Job-shop Scheduling Problem Based on Improved Discrete Particle Swarm Optimization Algorithm[J]. Computer Science, 2018, 45(4): 233 -239 .