Computer Science ›› 2024, Vol. 51 ›› Issue (11A): 231200017-6.doi: 10.11896/jsjkx.231200017

• Intelligent Computing • Previous Articles     Next Articles

Regular Expression Generation Based on Natural Language Syntax Information

WANG Hao , WU Junhua   

  1. College of Computer and Information Engineering,Nanjing Tech University,Nanjing 211816,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:WANG Hao,born in 1999,postgra-duate.His main research interests include software engineering and natural language processing.
    WU Junhua,born in 1965,Ph.D,professor.Her main research interests include software engineering,program analysis and natural language processing.
  • Supported by:
    Key Research Project on Higher Education Informatization of the Higher Education Technology Research Association in Jiangsu Province(2021JSETKT023).

Abstract: Regular expressions are composed of a series of characters and metacharacters,defining a matching pattern that can be used to check whether a string matches the desired criteria.Many developers find it is difficlult to write regular expressions during the software development process.Therefore,generating regular expressions based on natural language requirements has become a research focus.In recent years,systems that transform natural language descriptions into regular expressions have achieved some research results,but often only for simple serialized texts.This paper explores methods for converting natural language queries into regular expressions that can execute their intended functionality.Given the successful application of syntactic parsing in natural language processing,our model utilizes the structural information of natural language by embedding syntax parse trees in a hierarchically aggregated manner.We employ the Tree-transformer architecture,suitable for input tree structures,to perform self-attention encoding on natural language descriptions.The decoder uses cross-attention to predict the regular expression.The model is validated on two public datasets.Experimental results demonstrate that our model effectively improves the quality of generated regular expressions.It outperforms existing models in the DFA-Equal-Acc evaluation metric.

Key words: Regular expression generation, Tree-Transformer, Syntactic parsing

CLC Number: 

  • TP391
[1]MILOSAVLJEVIĆ B,VIDAKOVIĆ M,KONJOVIĆ Z.Automa-tic code generation for database-oriented web applications[C]//Proceedings of the Inaugural Conference on the Principles and Practice of Programming,2002 and Proceedings of the Second Workshop on Intermediate Representation Engineering for Virtual Machines,2002:59-64.
[2]ZETTLEMOYER L S,COLLINS M.Learning to map sentences to logical form:Structured classification with probabilistic categorial grammars[J].arXiv:1207.1420,2012.
[3]HINDLE A,BARR E T,GABEL M,et al.On the naturalness ofsoftware[J].Communications of the ACM,2016,59(5):122-131.
[4]KUSHMAN N,BARZILAY R.Using semantic unification togenerate regular expressions from natural language[C]//North American Chapter of the Association for Computational Linguistics(NAACL).2013.
[5]YIN P,NEUBIG G.Tranx:A transition-based neural abstractsyntax parser for semantic parsing and code generation[J].ar-Xiv:1810.02720,2018.
[6]ZHONG V,XIONG C,SOCHER R.Seq2sql:Generating structured queries from natural language using reinforcement learning[J].arXiv:1709.00103,2017.
[7]LIU X,JIANG Y,WU D.A lightweight framework for regular expression verification[C]//2019 IEEE 19th International Symposium on High Assurance Systems Engineering(HASE).IEEE,2019:1-8.
[8]SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequencelearning with neural networks[J].Advances in Neural Information Processing Systems,2014,27.
[9]ZHONG Z,GUO J,YANG W,et al.Semregex:A semantics-based approach for generating regular expressions from natural language specifications[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018.
[10]PARK J U,KO S K,COGNETTA M,et al.Softregex:Generating regex from natural language descriptions using softened regex equivalence[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019:6425-6431.
[11]CHEN X,LIU C,SONG D.Tree-to-tree neural networks forprogram translation[J].Advances in Neural Information Processing Systems,2018,31.
[12]RAO J,UPASANI K,BALAKRISHNAN A,et al.A tree-to-sequence model for neural nlg in task-oriented dialog[C]//Proceedings of the 12th International Conference on Natural Language Generation.2019:95-100.
[13]SCHUSTER S,MANNING C D.Enhanced english universal dependencies:An improved representation for natural language understanding tasks[C]//Proceedings of the Tenth International Conference on Language Resources and Evaluation(LREC'16).2016:2371-2378.
[14]SHI X,PADHI I,KNIGHT K.Does string-based neural MTlearn source syntax?[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.2016:1526-1534.
[15]CHEN H,HUANG S,CHIANG D,et al.Improved neural machine translation with a syntax-aware encoder and decoder[J].arXiv:1707.05436,2017.
[16]LOCASCIO N,NARASIMHAN K,DELEON E,et al.Neuralgeneration of regular expressions from natural language with minimal domain knowledge[J].arXiv:1608.03000,2016.
[17]RABINOVICH M,STERN M,KLEIN D.Abstract syntax networks for code generation and semantic parsing[J].arXiv:1704.07535,2017.
[18]WANG D C,APPEL A W,KORN J L,et al.The Zephyr Abstract Syntax Description Language[C]//DSL.1997.
[19]KUSHMAN N,BARZILAY R.Using semantic unification togenerate regular expressions from natural language[C]//North American Chapter of the Association for Computational Linguistics(NAACL).2013.
[20]YE X,CHEN Q,WANG X,et al.Sketch-driven regular expression generation from natural language and examples[J].Transactions of the Association for Computational Linguistics,2020,8:679-694.
[21]ZHANG S,GU X,CHEN Y,et al.InfeRE:Step-by-Step Regex Generation via Chain of Inference[C]//2023 38th IEEE/ACM International Conference on Automated Software Engineering(ASE).IEEE,2023:1505-1515.
[1] KA Zuming, ZHAO Peng, ZHANG Bo, FU Xiaoning. Survey of Recommender Systems for Large Language Models [J]. Computer Science, 2024, 51(11A): 240800111-11.
[2] SUI Haoran, ZHOU Xiaohang, ZHANG Ning. Product Improvement Based on UGC:Review on Methods and Applications of Attribute Extractionand Attribute Sentiment Classification [J]. Computer Science, 2024, 51(11A): 240400070-9.
[3] WANG Yuhan, MA Fuyuan, WANG Ying. Construction of Fine-grained Medical Knowledge Graph Based on Deep Learning [J]. Computer Science, 2024, 51(11A): 230900157-7.
[4] LIN Haonan, TAN Hongye, FENG Huimin. Clinical Findings Recognition and Yin & Yang Status Inference Based on Doctor-Patient Dialogue [J]. Computer Science, 2024, 51(11A): 231000084-7.
[5] FU Mingrui, LI Weijiang. Multi-task Emotion-Cause Pair Extraction Method Based on Position-aware Interaction Network [J]. Computer Science, 2024, 51(11A): 231000086-9.
[6] QIN Xianping, DING Zhaoxu, ZHONG Guoqiang, WANG Dong. Deep Learning-based Method for Mining Ocean Hot Spot News [J]. Computer Science, 2024, 51(11A): 231200005-10.
[7] LIN Huang, LI Bicheng. Aspect-based Sentiment Analysis Based on BERT Model and Graph Attention Network [J]. Computer Science, 2024, 51(11A): 240400018-7.
[8] XIANG Heng, YANG Mingyou, LI Meng. Study on Named Entity Recognition of NOTAM Based on BiLSTM-CRF [J]. Computer Science, 2024, 51(11A): 240300148-6.
[9] GUO Ruiqiang, JIA Xiaowen, YANG Shilong, WEI Qianqiang. Multi-task Learning Model for Text Feature Enhancement in Medical Field [J]. Computer Science, 2024, 51(11A): 240200041-7.
[10] GAO Weijun, SUN Zibi, LIU Shujun. Sentiment Analysis of Image-Text Based on Multiple Perspectives [J]. Computer Science, 2024, 51(11A): 231200163-8.
[11] PANG Bowen, CHEN Yifei, HUANG Jia. Fine-grained Entity Recognition Model in Audit Domain Based on Adversarial Migration ofSample Contributions [J]. Computer Science, 2024, 51(11A): 240300197-8.
[12] SONG Ziyan, LUO Chuan, LI Tianrui, CHEN Hongmei. Classification of Thoracic Diseases Based on Attention Mechanisms and Two-branch Networks [J]. Computer Science, 2024, 51(11A): 230900116-6.
[13] ZHAO Yanli, XING Yitong, LI Xiaomin, SONG Cai, WANG Peipei. Study on Automatic Segmentation Method of Retinal Blood Vessel Images [J]. Computer Science, 2024, 51(11A): 231000061-7.
[14] HU Yimin, Qu Guang, WANG Xiabing, ZHANG Jie, LI Jiadong. EO-YOLOX Model for Insulators Detection in Transmission Lines [J]. Computer Science, 2024, 51(11A): 240200107-6.
[15] ZHANG Feng. Graphical LCD Pixel Defect Detection Algorithm Based on Improved YOLOV8 [J]. Computer Science, 2024, 51(11A): 240100162-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!