计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 117-124.doi: 10.11896/jsjkx.201100090

• 计算机软件 • 上一篇    下一篇

基于卷积神经网络的代码注释自动生成方法

彭斌, 李征, 刘勇, 吴永豪   

  1. 北京化工大学信息科学与技术学院 北京100029
  • 收稿日期:2020-11-11 修回日期:2021-04-09 出版日期:2021-12-15 发布日期:2021-11-26
  • 通讯作者: 吴永豪(appmlk@outlook.com)
  • 作者简介:1252031372@qq.com
  • 基金资助:
    国家自然科学基金(61902015,61872026)

Automatic Code Comments Generation Method Based on Convolutional Neural Network

PENG Bin, LI Zheng, LIU Yong, WU Yong-hao   

  1. College of Information Science and Technology,Beijing University of Chemical Technology,Beijing 100029,China
  • Received:2020-11-11 Revised:2021-04-09 Online:2021-12-15 Published:2021-11-26
  • About author:PENG Bin,born in 1994,postgraduate.His main research interests include code comment generation and artificial intelligence.
    WU Yong-hao,born in 1995,Ph.D,candidate.His main research interests include software testing and fault localization.
  • Supported by:
    National Natural Science Foundation of China(61902015,61872026).

摘要: 自动化代码注释生成技术通过分析源代码的语义信息生成对应的自然语言描述文本,可以帮助开发人员更好地理解程序,降低软件维护的时间成本。大部分已有技术是基于递归神经网络(Recurrent Neural Network,RNN)的编码器和解码器神经网络实现的,但这种方法存在长期依赖问题,即在分析距离较远的代码块时,生成的注释信息的准确性不高。为此,文中提出了一种基于卷积神经网络(Convolutional Neural Network,CNN)的自动化代码注释生成方法来缓解长期依赖问题,以生成更准确的注释信息。具体而言,通过构造基于源代码的CNN和基于AST的CNN来捕获源代码的语义信息。实验结果表明,与DeepCom和Hybrid-DeepCom这两种最新的方法相比,在常用的BLEU和METEOR两种评测指标下,所提方法能更好地生成代码注释,且执行时间更短。

关键词: 程序理解, 代码注释生成, 卷积神经网络, 长短期记忆网络

Abstract: Automatic code comment generation technology can analyze the semantic information of source code and generate corresponding natural language descriptions,which can help developers understand the program and reduce the time cost during software maintenance.Most of the existing technologies are based on the encoder and decoder model of the recurrent neural network(RNN).However,this method suffers from long-term dependency problem,which means it cannot generate high-quality comments when analyzing far-away code blocks.To solve this problem,this paper proposes an automatic code comment generation method,which uses the convolutional neural network(CNN) to alleviate the inaccurate comments information caused by the long-term dependence problem.More specifically,this paper uses two CNNs,one source-code based CNN and one AST-based CNN,to capture source code's semantic information.The experimental results indicate that,compared to the two most recent methods,DeepCom and Hybrid-DeepCom,the method proposed in this paper generates more useful code comments and takes less time to execute.

Key words: Program comprehension, Code comment generation, Convolutional neural network, Long short-term memory network

中图分类号: 

  • TP311
[1]XIA X,BAO L,LO D,et al.Measuring program comprehen- sion:A large-scale field study with professionals[J].IEEE Transactions on Software Engineering,2017,44(10):951-976.
[2]HU X,LI G,XIA X,et al.Deep code comment generation[C]//2018 IEEE/ACM 26th International Conference on Program Comprehension(ICPC).IEEE,2018:200-210.
[3]CHEN X,YANG G,CUI Z Q,et al.State-of-the-Art survey of Automatic Code Comment Generation[J].Journal of Software,2021,32(7):2118-2141.
[4]SONG X,SUN H,WANG X,et al.A survey of automatic gene- ration of source code comments:Algorithms and techniques[J].IEEE Access,2019,7:111411-111428.
[5]ZHU Y,PAN M.Automatic Code Summarization:A Systematic Literature Review[J].arXiv:1909.04352,2019.
[6]RODEGHERO P,LIU C,MCBURNEY P W,et al.An eye- tracking study of java programmers and application to source code summarization[J].IEEE Transactions on Software Engineering,2015,41(11):1038-1054.
[7]MORENO L,APONTE J,SRIDHARA G,et al.Automatic ge- neration of natural language summaries for java classes[C]//2013 21st International Conference on Program Comprehension(ICPC).IEEE,2013:23-32.
[8]LECLAIR A,JIANG S,MCMILLAN C.A neural model for generating natural language summaries of program subroutines[C]//2019 IEEE/ACM 41st International Conference on Software Engineering(ICSE).IEEE,2019:795-806.
[9]SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[C]//Advances in Neural Information Processing Systems.2014:3104-3112.
[10]SUN Z,ZHU Q,MOU L,et al.A grammar-based structural cnn decoder for code generation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019,33:7055-7062.
[11]LECLAIR A,HAQUE S,WU L,et al.Improved code summarization via a graph neural network[J].arXiv:2004.02843,2020.
[12]SHIDO Y,KOBAYASHI Y,YAMAMOTO A,et al.Automatic source code summarization with extended tree-lstm[C]//2019 International Joint Conference on Neural Networks(IJCNN).IEEE,2019:1-8.
[13]CHEN Q,HU H,LIU Z.Code Summarization with Abstract Syntax Tree[C]//International Conference on Neural Information Processing.Cham:Springer,2019:652-660.
[14]LECHNER M,HASANI R.Learning Long-Term Dependencies in Irregularly-Sampled Time Series[J].arXiv:2006.04418,2020.
[15]HU X,LI G,XIA X,et al.Summarizing source code with trans- ferred api knowledge[C]//2018 27th International Joint Confe-rence on Artificial Intelligence.2018:1-9.
[16]HU X,LI G,XIA X,et al.Deep code comment generation with hybrid lexical and syntactical information[J].Empirical Software Engineering,2020,25(3):2179-2217.
[17]PAPINENI K,ROUKOS S,WARD T,et al.BLEU:a method for automatic evaluation of machine translation[C]//Procee-dings of the 40th Annual Meeting of the Association for Computational Linguistics.2002:311-318.
[18]BANERJEE S,LAVIE A.METEOR:An automatic metric for MT evaluation with improved correlation with human judgments[C]//Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation AND/OR Summarization.2005:65-72.
[19]XIA Q,YEH C H,CHEN X Y.A Deep Bidirectional Highway Long Short-Term Memory Network Approach to Chinese Semantic Role Labeling[C]//2019 International Joint Conference on Neural Networks(IJCNN).IEEE,2019:1-6.
[20]CHUNG J,GULCEHRE C,CHO K H,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[J].arXiv:1412.3555,2014.
[21]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1-9.
[22]REN S,HE K,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[J].Advances in Neural Information Processing Systems,2015,28:91-99.
[23]GUO B,ZHANG C,LIU J,et al.Improving text classification with weighted word embeddings via a multi-channel TextCNN model[J].Neurocomputing,2019,363:366-374.
[24]SHEN Y,HE X,GAO J,et al.Learning semantic representa-tions using convolutional neural networks for web search[C]//Proceedings of the 23rd International Conference on World Wide Web.2014:373-374.
[25]GU X,ZHANG H,ZHANG D,et al.Deep API learning[C]//Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering.2016:631-642.
[26]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014.
[27]LI Y,WANG Q,XIAO T,et al.Neural Machine Translation with Joint Representation[C]//AAAI.2020:8285-8292.
[28]WEI B,LI G,XIA X,et al.Code generation as a dual task of code summarization[C]//Advances in Neural Information Processing Systems.2019:6563-6573.
[29]DENKOWSKI M,LAVIE A.Meteor universal:Language specific translation evaluation for any target language[C]//Proceedings of the Ninth workshop on Statistical Machine Translation.2014:376-380.
[1] 黄颖琦, 陈红梅. 基于代价敏感卷积神经网络的非平衡问题混合方法[J]. 计算机科学, 2021, 48(9): 77-85.
[2] 徐涛, 田崇阳, 刘才华. 基于深度学习的人群异常行为检测综述[J]. 计算机科学, 2021, 48(9): 125-134.
[3] 王乐, 杨晓敏. 基于感知损失的遥感图像全色锐化反馈网络[J]. 计算机科学, 2021, 48(8): 91-98.
[4] 王炽, 常俊. 基于3D卷积神经网络的CSI跨场景手势识别方法[J]. 计算机科学, 2021, 48(8): 322-327.
[5] 程松盛, 潘金山. 基于深度学习特征匹配的视频超分辨率方法[J]. 计算机科学, 2021, 48(7): 184-189.
[6] 王栋, 周大可, 黄有达, 杨欣. 基于多尺度多粒度特征的行人重识别[J]. 计算机科学, 2021, 48(7): 238-244.
[7] 熊朝阳, 王婷. 基于卷积神经网络的建筑构件图像识别[J]. 计算机科学, 2021, 48(6A): 51-56.
[8] 胡京徽, 许鹏. 一种基于图像分类的航空紧固件产品自动分类方法[J]. 计算机科学, 2021, 48(6A): 63-66.
[9] 和青芳, 王慧, 程光. 自适应小数据集乳腺癌病理组织分类研究[J]. 计算机科学, 2021, 48(6A): 67-73.
[10] 徐少伟, 秦品乐, 曾建朝, 赵致楷, 高媛, 王丽芳. 基于多级特征和全局上下文的纵膈淋巴结分割算法[J]. 计算机科学, 2021, 48(6A): 95-100.
[11] 王建明, 黎向锋, 叶磊, 左敦稳, 张丽萍. 基于信道注意结构的生成对抗网络医学图像去模糊[J]. 计算机科学, 2021, 48(6A): 101-106.
[12] 韩斌, 曾松伟. 基于多特征融合和卷积神经网络的植物叶片识别[J]. 计算机科学, 2021, 48(6A): 113-117.
[13] 余晗青, 杨贞, 殷志坚. 基于区域激活策略的Tiny YOLOv3目标检测算法[J]. 计算机科学, 2021, 48(6A): 118-121.
[14] 刘吉华, 张梦迪, 彭红霞, 贾兴平. 基于卷积神经网络的汽车销量预测模型[J]. 计算机科学, 2021, 48(6A): 178-183.
[15] 陈扬, 王金亮, 夏炜, 杨颢, 朱润, 奚雪峰. 基于特征自动提取的足迹图像聚类方法[J]. 计算机科学, 2021, 48(6A): 255-259.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 周丹晨. 融合粗糙集和商空间的企业级信息系统日志挖掘方法[J]. 计算机科学, 2014, 41(Z6): 421 -424 .
[2] 丁勇,朱辉生,曹红根. 基于混合EHMM模型的数据流预测[J]. 计算机科学, 2014, 41(Z6): 391 -393 .
[3] 李轶鲲,胡玉玺,杨萍. 基于频率域信息的遥感图像数据库水体检索[J]. 计算机科学, 2016, 43(Z6): 118 -121 .
[4] 杨洁, 王国胤, 张清华, 冯林. 层次粒结构下粗糙模糊集的不确定性度量[J]. 计算机科学, 2019, 46(1): 45 -50 .
[5] 冯安琪, 钱丽萍, 黄玉蘋, 吴远. RFID环境下基于自适应卡尔曼滤波的高速移动车辆速度预测[J]. 计算机科学, 2019, 46(4): 100 -105 .
[6] 程盛淦, 于浩然, 韦建文, 林新华. 基于定点压缩技术的双层粒子网格算法的设计与优化[J]. 计算机科学, 2020, 47(8): 56 -61 .
[7] 杨如涵, 戴毅茹, 王坚, 董津. 基于表示学习的工业领域人机物本体融合[J]. 计算机科学, 2021, 48(5): 190 -196 .
[8] 潘孝勤, 芦天亮, 杜彦辉, 仝鑫. 基于深度学习的语音合成与转换技术综述[J]. 计算机科学, 2021, 48(8): 200 -208 .
[9] 王俊, 王修来, 庞威, 赵鸿飞. 面向科技前瞻预测的大数据治理研究[J]. 计算机科学, 2021, 48(9): 36 -42 .
[10] 余力, 杜启翰, 岳博妍, 向君瑶, 徐冠宇, 冷友方. 基于强化学习的推荐研究综述[J]. 计算机科学, 2021, 48(10): 1 -18 .