计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 231200017-6.doi: 10.11896/jsjkx.231200017

• 智能计算 • 上一篇    下一篇

基于自然语言句法信息的正则表达式生成

王昊, 吴军华   

  1. 南京工业大学计算机与信息工程学院 南京 211816
  • 出版日期:2024-11-16 发布日期:2024-11-13
  • 通讯作者: 吴军华(wujh@njtech.edu.cn)
  • 作者简介:(17660457668@163.com)
  • 基金资助:
    江苏省高等学校教育技术研究会高等教育信息化研究课题重点课题(2021JSETKT023)

Regular Expression Generation Based on Natural Language Syntax Information

WANG Hao , WU Junhua   

  1. College of Computer and Information Engineering,Nanjing Tech University,Nanjing 211816,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:WANG Hao,born in 1999,postgra-duate.His main research interests include software engineering and natural language processing.
    WU Junhua,born in 1965,Ph.D,professor.Her main research interests include software engineering,program analysis and natural language processing.
  • Supported by:
    Key Research Project on Higher Education Informatization of the Higher Education Technology Research Association in Jiangsu Province(2021JSETKT023).

摘要: 正则表达式由一系列字符和元字符组成,定义了一种匹配规则,可以用来检查一个字符串是否与所需的模式匹配。在软件开发过程中,很多开发人员发现编写正则表达式较为困难。因此,根据自然语言需求描述生成正则表达式成为研究热点。近年来,将自然语言描述转化为正则表达式的系统取得了一些研究成果,但往往只针对简单的序列化文本。探讨了将自然语言查询转化为可以执行其功能的正则表达式的方法。鉴于自然语言处理中句法解析的成功应用,模型使用自然语言的结构信息,以分层聚合的方式对语法解析树进行嵌入,并使用适用于输入树结构的Tree-transformer架构对自然语言描述进行自注意编码。解码器使用交叉注意力来预测正则表达式。在两个公共数据集上对模型进行了验证。实验证明,所提模型有效地提高了生成的正则表达式的质量,并在DFA-Equal-Acc评估指标中优于现有模型。

关键词: 正则表达式生成, Tree-Transformer, 句法解析

Abstract: Regular expressions are composed of a series of characters and metacharacters,defining a matching pattern that can be used to check whether a string matches the desired criteria.Many developers find it is difficlult to write regular expressions during the software development process.Therefore,generating regular expressions based on natural language requirements has become a research focus.In recent years,systems that transform natural language descriptions into regular expressions have achieved some research results,but often only for simple serialized texts.This paper explores methods for converting natural language queries into regular expressions that can execute their intended functionality.Given the successful application of syntactic parsing in natural language processing,our model utilizes the structural information of natural language by embedding syntax parse trees in a hierarchically aggregated manner.We employ the Tree-transformer architecture,suitable for input tree structures,to perform self-attention encoding on natural language descriptions.The decoder uses cross-attention to predict the regular expression.The model is validated on two public datasets.Experimental results demonstrate that our model effectively improves the quality of generated regular expressions.It outperforms existing models in the DFA-Equal-Acc evaluation metric.

Key words: Regular expression generation, Tree-Transformer, Syntactic parsing

中图分类号: 

  • TP391
[1]MILOSAVLJEVIĆ B,VIDAKOVIĆ M,KONJOVIĆ Z.Automa-tic code generation for database-oriented web applications[C]//Proceedings of the Inaugural Conference on the Principles and Practice of Programming,2002 and Proceedings of the Second Workshop on Intermediate Representation Engineering for Virtual Machines,2002:59-64.
[2]ZETTLEMOYER L S,COLLINS M.Learning to map sentences to logical form:Structured classification with probabilistic categorial grammars[J].arXiv:1207.1420,2012.
[3]HINDLE A,BARR E T,GABEL M,et al.On the naturalness ofsoftware[J].Communications of the ACM,2016,59(5):122-131.
[4]KUSHMAN N,BARZILAY R.Using semantic unification togenerate regular expressions from natural language[C]//North American Chapter of the Association for Computational Linguistics(NAACL).2013.
[5]YIN P,NEUBIG G.Tranx:A transition-based neural abstractsyntax parser for semantic parsing and code generation[J].ar-Xiv:1810.02720,2018.
[6]ZHONG V,XIONG C,SOCHER R.Seq2sql:Generating structured queries from natural language using reinforcement learning[J].arXiv:1709.00103,2017.
[7]LIU X,JIANG Y,WU D.A lightweight framework for regular expression verification[C]//2019 IEEE 19th International Symposium on High Assurance Systems Engineering(HASE).IEEE,2019:1-8.
[8]SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequencelearning with neural networks[J].Advances in Neural Information Processing Systems,2014,27.
[9]ZHONG Z,GUO J,YANG W,et al.Semregex:A semantics-based approach for generating regular expressions from natural language specifications[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018.
[10]PARK J U,KO S K,COGNETTA M,et al.Softregex:Generating regex from natural language descriptions using softened regex equivalence[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019:6425-6431.
[11]CHEN X,LIU C,SONG D.Tree-to-tree neural networks forprogram translation[J].Advances in Neural Information Processing Systems,2018,31.
[12]RAO J,UPASANI K,BALAKRISHNAN A,et al.A tree-to-sequence model for neural nlg in task-oriented dialog[C]//Proceedings of the 12th International Conference on Natural Language Generation.2019:95-100.
[13]SCHUSTER S,MANNING C D.Enhanced english universal dependencies:An improved representation for natural language understanding tasks[C]//Proceedings of the Tenth International Conference on Language Resources and Evaluation(LREC'16).2016:2371-2378.
[14]SHI X,PADHI I,KNIGHT K.Does string-based neural MTlearn source syntax?[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.2016:1526-1534.
[15]CHEN H,HUANG S,CHIANG D,et al.Improved neural machine translation with a syntax-aware encoder and decoder[J].arXiv:1707.05436,2017.
[16]LOCASCIO N,NARASIMHAN K,DELEON E,et al.Neuralgeneration of regular expressions from natural language with minimal domain knowledge[J].arXiv:1608.03000,2016.
[17]RABINOVICH M,STERN M,KLEIN D.Abstract syntax networks for code generation and semantic parsing[J].arXiv:1704.07535,2017.
[18]WANG D C,APPEL A W,KORN J L,et al.The Zephyr Abstract Syntax Description Language[C]//DSL.1997.
[19]KUSHMAN N,BARZILAY R.Using semantic unification togenerate regular expressions from natural language[C]//North American Chapter of the Association for Computational Linguistics(NAACL).2013.
[20]YE X,CHEN Q,WANG X,et al.Sketch-driven regular expression generation from natural language and examples[J].Transactions of the Association for Computational Linguistics,2020,8:679-694.
[21]ZHANG S,GU X,CHEN Y,et al.InfeRE:Step-by-Step Regex Generation via Chain of Inference[C]//2023 38th IEEE/ACM International Conference on Automated Software Engineering(ASE).IEEE,2023:1505-1515.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!