Computer Science ›› 2024, Vol. 51 ›› Issue (6): 61-67.doi: 10.11896/jsjkx.230400137

• Computer Software • Previous Articles     Next Articles

Prompt Learning Based Parameter-efficient Code Generation

XU Yiran1, ZHOU Yu1,2   

  1. 1 College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China
    2 Ministry Key Laboratory for Safety-Critical Software Development and Verification,Nanjing University of Aeronautics and Astronautics,Nanjing 211100,China
  • Received:2023-04-20 Revised:2023-09-26 Online:2024-06-15 Published:2024-06-05
  • About author:XU Yiran,born in 1999,postgraduate.His main research interests include intelligent software development,code generation,and natural language processing.
    ZHOU Yu,born in 1980,postdoctor,professor.His main research interests include software evolution analysis,mining software repositories,software architecture,and reliability analysis.
  • Supported by:
    National Natural Science Foundation of China(61972197),Defense Industrial Technology Development Program(JCKY2022605C006) and Natural Science Foundation of Jiangsu Province,China(BK20201292).

Abstract: Automatic code generation is one of the effective ways to improve the efficiency of software development.Existing research often regards code generation as a sequence-to-sequence task,and the process of fine-tuning of large-scale pre-trained language models is often accompanied by high computing cost.In this paper,a method of prompt learning based parameter-efficient code generation is proposed.This method guides the pre-trained language model to generate code by querying the result which is most similar to the current intent in the code corpus,and most of the parameters of the model are fixed in the process to achieve the effect of reducing computing cost.In order to verify the effectiveness of PPECG,two datasets for code generation are selected in this paper,namely CONCODE and Solidity4CG.The effectiveness of PPECG is verified by calculating the BLEU,CodeBLEU and Exact Match values of the generated results.Experimental results show that PPECG effectively reduces the graphic memory cost during fine-tuning,and is basically close to or even better than the current SOTA method on the above benchmarks,which is capable of completing code generation tasks well.

Key words: Code generation, Prompt learning, Pre-trained language model, Information retrieval, Smart contract

CLC Number: 

  • TP311
[1]ZHANG F Y,PENG X,CHEN C,et al.Research on Code Ana-lysis Based on Deep Learning[J].Computer Applications and Software,2018,35(6):9.
[2]YANG Z Z,CHEN S R,GAO C Y,et al.Deep Learning Based Code Generation Methods:A Literature Review[J].Journal of Software,2024,35(2):604-628.
[3]YIN P,NEUBIG G.TRANX:a Transition-based Neural Abs-tract Syntax Parser for Semantic Parsing and Code Generation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.2018:7-12.
[4]BEAU N,CRABBÉ B.The Impact of Lexical and Grammatical Processing on Generating Code from Natural Language[C]//Findings of the Association for Computational Linguistics.ACL,2022:2204-2214.
[5]WEI B,LI G,XIA X,et al.Code Generation as a Dual Task of Code Summarization[C]//Proceedings of the International Conference on Neural Information Processing Systems.NIPS,2019:6563-6573.
[6]SUN Z,ZHU Q,MOU L,et al.A grammar-based structural cnn decoder for code generation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:7055-7062.
[7]SUN Z,ZHU Q,XIONG Y,et al.Treegen:A tree-based transformer architecture for code generation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:8984-8991.
[8]HASHIMOTO T B,GUU K,OREN Y,et al.A retrieve-and-edit framework for predicting structured outputs[C]//Procee-dings of the 32nd International Conference on Neural Information Processing Systems.2018:10073-10083.
[9]FRIED D,AGHAJANYAN A,LIN J,et al.Incoder:A generative model for code infilling and synthesis[J].arXiv:2204.05999,2022.
[10]CHEN B,ZHANG F,NGUYEN A,et al.Codet:Code generation with generated tests[J].arXiv:2207.10397,2022.
[11]WANG Y,WANG W,JOTY S,et al.CodeT5:Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.2021:8696-8708.
[12]AHMAD W U,CHAKRABORTY S,RAY B,et al.Unified pre-training for program understanding and generation[J].arXiv:2103.06333,2021.
[13]LU S,GUO D,REN S,et al.Codexglue:A machine learning benchmark dataset for code understanding and generation[J].arXiv:2102.04664,2021.
[14]CHEN M,TWOREK J,JUN H,et al.Evaluating large language models trained on code[J].arXiv:2107.03374,2021.
[15]LI Y,CHOI D,CHUNG J,et al.Competition-level code generation with alphacode[J].arXiv:2203.07814,2022.
[16]CHRISTOPOULOU F,LAMPOURAS G,GRITTA M,et al.PanGu-Coder:Program Synthesis with Function-Level Language Modeling[J].arXiv:2207.11280,2022.
[17]PARVEZ M R,AHMAD W U,CHAKRABORTY S,et al.Retrieval augmented code generation and summarization[J].arXiv:2108.11601,2021.
[18]WEI B,LI Y,LI G,et al.Retrieve and refine:exemplar-basedneural comment generation[C]//Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering.2020:349-360.
[19]YANG G,LIU K,CHEN X,et al.CCGIR:Information retrieval-based code comment generation method for smart contracts[J].Knowledge-Based Systems,2022,237:107858.
[20]SU J,CAO J,LIU W,et al.Whitening sentence representations for better semantics and faster retrieval[J].arXiv:2103.15316,2021.
[21]CHEN X,YU C,YANG G,et al.Bash Code Comment Generation Method Based on Dual Information Retrieval[J].Journal of Software,2023,34(3):1310-1329.
[22]LIU P,YUAN W,FU J,et al.Pre-train,prompt,and predict:A systematic survey of prompting methods in natural language processing[J].arXiv:2107.13586,2021.
[23]IYER S,KONSTAS I,CHEUNG A,et al.Mapping language to code in programmatic context[J].arXiv:1808.09588,2018.
[24]SHAZEER N,STERN M.Adafactor:Adaptive learning rateswith sublinear memory cost[C]//International Conference on Machine Learning.PMLR,2018:4596-4604.
[25]REN S,GUO D,LU S,et al.Codebleu:a method for automatic evaluation of code synthesis[J].arXiv:2009.10297,2020.
[26]YANG G,ZHOU Y,CHEN X,et al.ExploitGen:Template-augmented exploit code generation based on CodeBERT[J].Journal of Systems and Software,2023,197:111577.
[27]APINENI K,ROUKOS S,WARD T,et al.Bleu:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.2002:311-318.
[1] LIU Jun, RUAN Tong, ZHANG Huanhuan. Prompt Learning-based Generative Approach Towards Medical Dialogue Understanding [J]. Computer Science, 2024, 51(5): 258-266.
[2] LIU Wei, LIU Yuzhao, TANG Congke, WANG Yuanyuan, SHE Wei, TIAN Zhao. Study on Blockchain Based Federated Distillation Data Sharing Model [J]. Computer Science, 2024, 51(3): 39-47.
[3] TONG Fei, SHAO Ranran. Study on Blockchain Based Access Control Model for Cloud Data [J]. Computer Science, 2023, 50(9): 16-25.
[4] ZHAO Mingmin, YANG Qiuhui, HONG Mei, CAI Chuang. Smart Contract Fuzzing Based on Deep Learning and Information Feedback [J]. Computer Science, 2023, 50(9): 117-122.
[5] LIANG Jiayin, XIE Zhipeng. Text Paraphrase Generation Based on Pre-trained Language Model and Tag Guidance [J]. Computer Science, 2023, 50(8): 150-156.
[6] LIN Feilong, YUE Yuedong, ZHENG Jianhui, CHEN Zhongyu, LI Minglu. Blockchain-based Identity Authentication and Authorization Mechanism [J]. Computer Science, 2023, 50(6A): 220700158-9.
[7] ZHENG Hong, QIAN Shihui, LIU Zerun, DU Wen. Formal Verification of Supply Chain Contract Based on Coloured Petri Nets [J]. Computer Science, 2023, 50(6A): 220300220-7.
[8] YE Han, LI Xin, SUN Haichun. Convolutional Network Entity Missing Detection Method Combined with Gated Mechanism [J]. Computer Science, 2023, 50(5): 262-269.
[9] PEI Cui, FAN Guisheng, YU Huiqun, YUE Yiming. Auction-based Edge Cloud Deadline-aware Task Offloading Strategy [J]. Computer Science, 2023, 50(4): 241-248.
[10] LIU Zerun, ZHENG Hong, QIU Junjie. Smart Contract Vulnerability Detection Based on Abstract Syntax Tree Pruning [J]. Computer Science, 2023, 50(4): 317-322.
[11] HUAN Zhigang, JIANG Guoquan, ZHANG Yujian, LIU Liu, LIU Shanshan. Employing Gated Mechanism to Incorporate Multi-features into Chinese Event Coreference Resolution [J]. Computer Science, 2023, 50(3): 291-297.
[12] GUO Caicai, JIN Yu. CASESC:A Cloud Auditing Scheme Based on Ethereum Smart Contracts [J]. Computer Science, 2023, 50(12): 368-376.
[13] CHEN Shifei, LIU Dong, JIANG He. CodeBERT-based Language Model for Design Patterns [J]. Computer Science, 2023, 50(12): 75-81.
[14] HUAN Zhigang, JIANG Guoquan, ZHANG Yujian, LIU Liu, DING Kun. End-to-End Event Coreference Resolution Based on Core Sentence [J]. Computer Science, 2023, 50(11): 185-191.
[15] CHEN Ruixiang, JIAO Jian, WANG Ruohua. Smart Contract Vulnerability Detection System Based on Ontology Reasoning [J]. Computer Science, 2023, 50(10): 336-342.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!