计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 117-124.doi: 10.11896/jsjkx.201100090

• 计算机软件 • 上一篇    下一篇

基于卷积神经网络的代码注释自动生成方法

彭斌, 李征, 刘勇, 吴永豪   

  1. 北京化工大学信息科学与技术学院 北京100029
  • 收稿日期:2020-11-11 修回日期:2021-04-09 出版日期:2021-12-15 发布日期:2021-11-26
  • 通讯作者: 吴永豪(appmlk@outlook.com)
  • 作者简介:1252031372@qq.com
  • 基金资助:
    国家自然科学基金(61902015,61872026)

Automatic Code Comments Generation Method Based on Convolutional Neural Network

PENG Bin, LI Zheng, LIU Yong, WU Yong-hao   

  1. College of Information Science and Technology,Beijing University of Chemical Technology,Beijing 100029,China
  • Received:2020-11-11 Revised:2021-04-09 Online:2021-12-15 Published:2021-11-26
  • About author:PENG Bin,born in 1994,postgraduate.His main research interests include code comment generation and artificial intelligence.
    WU Yong-hao,born in 1995,Ph.D,candidate.His main research interests include software testing and fault localization.
  • Supported by:
    National Natural Science Foundation of China(61902015,61872026).

摘要: 自动化代码注释生成技术通过分析源代码的语义信息生成对应的自然语言描述文本,可以帮助开发人员更好地理解程序,降低软件维护的时间成本。大部分已有技术是基于递归神经网络(Recurrent Neural Network,RNN)的编码器和解码器神经网络实现的,但这种方法存在长期依赖问题,即在分析距离较远的代码块时,生成的注释信息的准确性不高。为此,文中提出了一种基于卷积神经网络(Convolutional Neural Network,CNN)的自动化代码注释生成方法来缓解长期依赖问题,以生成更准确的注释信息。具体而言,通过构造基于源代码的CNN和基于AST的CNN来捕获源代码的语义信息。实验结果表明,与DeepCom和Hybrid-DeepCom这两种最新的方法相比,在常用的BLEU和METEOR两种评测指标下,所提方法能更好地生成代码注释,且执行时间更短。

关键词: 长短期记忆网络, 程序理解, 代码注释生成, 卷积神经网络

Abstract: Automatic code comment generation technology can analyze the semantic information of source code and generate corresponding natural language descriptions,which can help developers understand the program and reduce the time cost during software maintenance.Most of the existing technologies are based on the encoder and decoder model of the recurrent neural network(RNN).However,this method suffers from long-term dependency problem,which means it cannot generate high-quality comments when analyzing far-away code blocks.To solve this problem,this paper proposes an automatic code comment generation method,which uses the convolutional neural network(CNN) to alleviate the inaccurate comments information caused by the long-term dependence problem.More specifically,this paper uses two CNNs,one source-code based CNN and one AST-based CNN,to capture source code's semantic information.The experimental results indicate that,compared to the two most recent methods,DeepCom and Hybrid-DeepCom,the method proposed in this paper generates more useful code comments and takes less time to execute.

Key words: Code comment generation, Convolutional neural network, Long short-term memory network, Program comprehension

中图分类号: 

  • TP311
[1]XIA X,BAO L,LO D,et al.Measuring program comprehen- sion:A large-scale field study with professionals[J].IEEE Transactions on Software Engineering,2017,44(10):951-976.
[2]HU X,LI G,XIA X,et al.Deep code comment generation[C]//2018 IEEE/ACM 26th International Conference on Program Comprehension(ICPC).IEEE,2018:200-210.
[3]CHEN X,YANG G,CUI Z Q,et al.State-of-the-Art survey of Automatic Code Comment Generation[J].Journal of Software,2021,32(7):2118-2141.
[4]SONG X,SUN H,WANG X,et al.A survey of automatic gene- ration of source code comments:Algorithms and techniques[J].IEEE Access,2019,7:111411-111428.
[5]ZHU Y,PAN M.Automatic Code Summarization:A Systematic Literature Review[J].arXiv:1909.04352,2019.
[6]RODEGHERO P,LIU C,MCBURNEY P W,et al.An eye- tracking study of java programmers and application to source code summarization[J].IEEE Transactions on Software Engineering,2015,41(11):1038-1054.
[7]MORENO L,APONTE J,SRIDHARA G,et al.Automatic ge- neration of natural language summaries for java classes[C]//2013 21st International Conference on Program Comprehension(ICPC).IEEE,2013:23-32.
[8]LECLAIR A,JIANG S,MCMILLAN C.A neural model for generating natural language summaries of program subroutines[C]//2019 IEEE/ACM 41st International Conference on Software Engineering(ICSE).IEEE,2019:795-806.
[9]SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[C]//Advances in Neural Information Processing Systems.2014:3104-3112.
[10]SUN Z,ZHU Q,MOU L,et al.A grammar-based structural cnn decoder for code generation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019,33:7055-7062.
[11]LECLAIR A,HAQUE S,WU L,et al.Improved code summarization via a graph neural network[J].arXiv:2004.02843,2020.
[12]SHIDO Y,KOBAYASHI Y,YAMAMOTO A,et al.Automatic source code summarization with extended tree-lstm[C]//2019 International Joint Conference on Neural Networks(IJCNN).IEEE,2019:1-8.
[13]CHEN Q,HU H,LIU Z.Code Summarization with Abstract Syntax Tree[C]//International Conference on Neural Information Processing.Cham:Springer,2019:652-660.
[14]LECHNER M,HASANI R.Learning Long-Term Dependencies in Irregularly-Sampled Time Series[J].arXiv:2006.04418,2020.
[15]HU X,LI G,XIA X,et al.Summarizing source code with trans- ferred api knowledge[C]//2018 27th International Joint Confe-rence on Artificial Intelligence.2018:1-9.
[16]HU X,LI G,XIA X,et al.Deep code comment generation with hybrid lexical and syntactical information[J].Empirical Software Engineering,2020,25(3):2179-2217.
[17]PAPINENI K,ROUKOS S,WARD T,et al.BLEU:a method for automatic evaluation of machine translation[C]//Procee-dings of the 40th Annual Meeting of the Association for Computational Linguistics.2002:311-318.
[18]BANERJEE S,LAVIE A.METEOR:An automatic metric for MT evaluation with improved correlation with human judgments[C]//Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation AND/OR Summarization.2005:65-72.
[19]XIA Q,YEH C H,CHEN X Y.A Deep Bidirectional Highway Long Short-Term Memory Network Approach to Chinese Semantic Role Labeling[C]//2019 International Joint Conference on Neural Networks(IJCNN).IEEE,2019:1-6.
[20]CHUNG J,GULCEHRE C,CHO K H,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[J].arXiv:1412.3555,2014.
[21]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1-9.
[22]REN S,HE K,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[J].Advances in Neural Information Processing Systems,2015,28:91-99.
[23]GUO B,ZHANG C,LIU J,et al.Improving text classification with weighted word embeddings via a multi-channel TextCNN model[J].Neurocomputing,2019,363:366-374.
[24]SHEN Y,HE X,GAO J,et al.Learning semantic representa-tions using convolutional neural networks for web search[C]//Proceedings of the 23rd International Conference on World Wide Web.2014:373-374.
[25]GU X,ZHANG H,ZHANG D,et al.Deep API learning[C]//Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering.2016:631-642.
[26]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014.
[27]LI Y,WANG Q,XIAO T,et al.Neural Machine Translation with Joint Representation[C]//AAAI.2020:8285-8292.
[28]WEI B,LI G,XIA X,et al.Code generation as a dual task of code summarization[C]//Advances in Neural Information Processing Systems.2019:6563-6573.
[29]DENKOWSKI M,LAVIE A.Meteor universal:Language specific translation evaluation for any target language[C]//Proceedings of the Ninth workshop on Statistical Machine Translation.2014:376-380.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[3] 王馨彤, 王璇, 孙知信.
基于多尺度记忆残差网络的网络流量异常检测模型
Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network
计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011
[4] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[5] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[6] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[7] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[8] 赵冬梅, 吴亚星, 张红斌.
基于IPSO-BiLSTM的网络安全态势预测
Network Security Situation Prediction Based on IPSO-BiLSTM
计算机科学, 2022, 49(7): 357-362. https://doi.org/10.11896/jsjkx.210900103
[9] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[10] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[11] 刘月红, 牛少华, 神显豪.
基于卷积神经网络的虚拟现实视频帧内预测编码
Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network
计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[12] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[13] 孙福权, 崔志清, 邹彭, 张琨.
基于多尺度特征的脑肿瘤分割算法
Brain Tumor Segmentation Algorithm Based on Multi-scale Features
计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
[14] 康雁, 徐玉龙, 寇勇奇, 谢思宇, 杨学昆, 李浩.
基于Transformer和LSTM的药物相互作用预测
Drug-Drug Interaction Prediction Based on Transformer and LSTM
计算机科学, 2022, 49(6A): 17-21. https://doi.org/10.11896/jsjkx.210400150
[15] 吴子斌, 闫巧.
基于动量的映射式梯度下降算法
Projected Gradient Descent Algorithm with Momentum
计算机科学, 2022, 49(6A): 178-183. https://doi.org/10.11896/jsjkx.210500039
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!