计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 231200017-6.doi: 10.11896/jsjkx.231200017
王昊, 吴军华
WANG Hao , WU Junhua
摘要: 正则表达式由一系列字符和元字符组成,定义了一种匹配规则,可以用来检查一个字符串是否与所需的模式匹配。在软件开发过程中,很多开发人员发现编写正则表达式较为困难。因此,根据自然语言需求描述生成正则表达式成为研究热点。近年来,将自然语言描述转化为正则表达式的系统取得了一些研究成果,但往往只针对简单的序列化文本。探讨了将自然语言查询转化为可以执行其功能的正则表达式的方法。鉴于自然语言处理中句法解析的成功应用,模型使用自然语言的结构信息,以分层聚合的方式对语法解析树进行嵌入,并使用适用于输入树结构的Tree-transformer架构对自然语言描述进行自注意编码。解码器使用交叉注意力来预测正则表达式。在两个公共数据集上对模型进行了验证。实验证明,所提模型有效地提高了生成的正则表达式的质量,并在DFA-Equal-Acc评估指标中优于现有模型。
中图分类号:
[1]MILOSAVLJEVIĆ B,VIDAKOVIĆ M,KONJOVIĆ Z.Automa-tic code generation for database-oriented web applications[C]//Proceedings of the Inaugural Conference on the Principles and Practice of Programming,2002 and Proceedings of the Second Workshop on Intermediate Representation Engineering for Virtual Machines,2002:59-64. [2]ZETTLEMOYER L S,COLLINS M.Learning to map sentences to logical form:Structured classification with probabilistic categorial grammars[J].arXiv:1207.1420,2012. [3]HINDLE A,BARR E T,GABEL M,et al.On the naturalness ofsoftware[J].Communications of the ACM,2016,59(5):122-131. [4]KUSHMAN N,BARZILAY R.Using semantic unification togenerate regular expressions from natural language[C]//North American Chapter of the Association for Computational Linguistics(NAACL).2013. [5]YIN P,NEUBIG G.Tranx:A transition-based neural abstractsyntax parser for semantic parsing and code generation[J].ar-Xiv:1810.02720,2018. [6]ZHONG V,XIONG C,SOCHER R.Seq2sql:Generating structured queries from natural language using reinforcement learning[J].arXiv:1709.00103,2017. [7]LIU X,JIANG Y,WU D.A lightweight framework for regular expression verification[C]//2019 IEEE 19th International Symposium on High Assurance Systems Engineering(HASE).IEEE,2019:1-8. [8]SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequencelearning with neural networks[J].Advances in Neural Information Processing Systems,2014,27. [9]ZHONG Z,GUO J,YANG W,et al.Semregex:A semantics-based approach for generating regular expressions from natural language specifications[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018. [10]PARK J U,KO S K,COGNETTA M,et al.Softregex:Generating regex from natural language descriptions using softened regex equivalence[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019:6425-6431. [11]CHEN X,LIU C,SONG D.Tree-to-tree neural networks forprogram translation[J].Advances in Neural Information Processing Systems,2018,31. [12]RAO J,UPASANI K,BALAKRISHNAN A,et al.A tree-to-sequence model for neural nlg in task-oriented dialog[C]//Proceedings of the 12th International Conference on Natural Language Generation.2019:95-100. [13]SCHUSTER S,MANNING C D.Enhanced english universal dependencies:An improved representation for natural language understanding tasks[C]//Proceedings of the Tenth International Conference on Language Resources and Evaluation(LREC'16).2016:2371-2378. [14]SHI X,PADHI I,KNIGHT K.Does string-based neural MTlearn source syntax?[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.2016:1526-1534. [15]CHEN H,HUANG S,CHIANG D,et al.Improved neural machine translation with a syntax-aware encoder and decoder[J].arXiv:1707.05436,2017. [16]LOCASCIO N,NARASIMHAN K,DELEON E,et al.Neuralgeneration of regular expressions from natural language with minimal domain knowledge[J].arXiv:1608.03000,2016. [17]RABINOVICH M,STERN M,KLEIN D.Abstract syntax networks for code generation and semantic parsing[J].arXiv:1704.07535,2017. [18]WANG D C,APPEL A W,KORN J L,et al.The Zephyr Abstract Syntax Description Language[C]//DSL.1997. [19]KUSHMAN N,BARZILAY R.Using semantic unification togenerate regular expressions from natural language[C]//North American Chapter of the Association for Computational Linguistics(NAACL).2013. [20]YE X,CHEN Q,WANG X,et al.Sketch-driven regular expression generation from natural language and examples[J].Transactions of the Association for Computational Linguistics,2020,8:679-694. [21]ZHANG S,GU X,CHEN Y,et al.InfeRE:Step-by-Step Regex Generation via Chain of Inference[C]//2023 38th IEEE/ACM International Conference on Automated Software Engineering(ASE).IEEE,2023:1505-1515. |
|