基于多维度特征和混合神经网络的代码可读性评估方法

doi:10.11896/jsjkx.200800193

计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 94-99.doi: 10.11896/jsjkx.200800193

基于多维度特征和混合神经网络的代码可读性评估方法

米庆, 郭黎敏, 陈军成

北京工业大学信息学部北京100124

收稿日期:2020-08-31 修回日期:2021-02-15 出版日期:2021-12-15 发布日期:2021-11-26
通讯作者: 陈军成(juncheng@bjut.edu.cn)
作者简介:miqing@bjut.edu.cn
基金资助:
国家自然科学基金(61702029);北京市自然科学基金(4192004);北京市教委项目(KM201810005023)

Code Readability Assessment Method Based on Multidimensional Features and Hybrid Neural Networks

MI Qing, GUO Li-min, CHEN Jun-cheng

Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China

Received:2020-08-31 Revised:2021-02-15 Online:2021-12-15 Published:2021-11-26
About author:MI Qing,born in 1987,Ph.D,lecturer,is a member of China Computer Federation.Her main research interests include code readability assessment,deep learning and empirical experiments.
CHEN Jun-cheng,born in 1980,Ph.D,lecturer,is a member of ChinaCompu-ter Federation.His main research intere-sts include software testing,compiler optimization,machine learning and deep learning.
Supported by:
National Natural Science Foundation of China(61702029),Natural Science Foundation of Beijing,China(4192004) and Project of Beijing Municipal Education Commission(KM201810005023).

摘要/Abstract

摘要： 对代码可读性进行定量、准确的评估是有效保障软件质量、降低沟通成本以及维护成本、提高软件开发和演化效率的重要途径。然而,现有的针对代码可读性评估的研究方案大多是基于特征工程的,受到源代码表征方式、技术手段等多方面因素影响,其评估准确率并不高。为此,文中采用深度学习作为主要技术手段,提出了一种基于多维度特征和混合神经网络的代码可读性评估方法,通过整合并运用各种单一神经网络的优势,从字符级、词条级等不同维度挖掘源代码中蕴含的结构信息和语义信息,最终实现对代码可读性的量化评估。实验表明,该方法能够获得高达84.6%的评估准确率,比单独使用卷积神经网络提升了9.2%,比单独使用循环神经网络模型提升了6.5%,并且其表现优于现有的5个可读性模型,验证了所提出的多维度特征和混合神经网络的有效性。

关键词: 代码表征, 代码分析, 代码可读性, 软件质量保障, 深度学习

Abstract: Quantitative and accurate assessment of code readability is an important way to ensure software quality,reduce communication and maintenance costs,and improve the efficiency of software development and evolution.However,existing code readability studies depend mainly on the manual feature engineering method,which is likely to limit the model performance due to factors such as code representation strategies and technical means.Unlike prior studies,we propose a novel code readability assessment method based on multidimensional features and hybrid neural networks by using the technique of deep learning.Specifi-cally,we first propose a representation strategy with different granularity levels to transform source codes into matrices and vectors as the input to deep neural networks.We then build a CNN-BiGRU hybrid neural network that can automatically learn structural and semantic features from the source code.The experimental results show that our method is able to achieve an accuracy of 84.6%,which is 9.2% higher than CNN alone and 6.5% higher than BiGRU alone.Moreover,our method can outperform five state-of-the-art code readability models,which confirms the feasibility and effectiveness of multidimensional features and hybrid neural networks proposed in this study.

Key words: Code analysis, Code readability, Code representation, Deep learning, Software quality assurance

中图分类号:

TP311

米庆, 郭黎敏, 陈军成. 基于多维度特征和混合神经网络的代码可读性评估方法[J]. 计算机科学, 2021, 48(12): 94-99. https://doi.org/10.11896/jsjkx.200800193

MI Qing, GUO Li-min, CHEN Jun-cheng. Code Readability Assessment Method Based on Multidimensional Features and Hybrid Neural Networks[J]. Computer Science, 2021, 48(12): 94-99. https://doi.org/10.11896/jsjkx.200800193

参考文献

[1]HOOIMEIJER P,WEIMER W.Modeling bug report quality [C]//Proc. Twenty-Second IEEE/ACM Int.Conf.Autom.Softw.Eng.(ASE '07).2007:34.
[2]BUSE R P L,WEIMER W R.Learning a Metric for Code Rea- dability[J].IEEE Trans.Softw.Eng.,2010,3(4):546-558.
[3]SIVAPRAKASAM P.Improving Software Quality Through the Development of Code Readability[J].International Journal of Advanced Research in Computer and Communication Enginee-ring,2012,1(6):472-477.
[4]BUSE R P L,WEIMER W R.A metric for software readability [C]//Proceedings of the 2008 International Symposium on Software Testing and Analysis(ISSTA '08).2008:121.
[5]BOSWELL D,FOUCHER T.The Art of Readable Code:Simple and Practical Techniques for Writing Better Code[C]//O'Reilly Media,Inc..2011.
[6]FAKHOURY S,ROY D,HASSAN A,et al.Improving source code readability:theory and practice[C]//2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC).2019:2-12.
[7]SANTOS R M D,GEROSA M A.Impacts of coding practices on readability[C]//Proc.Int.Conf.Softw.Eng..2018:277-285.
[8]TASHTOUSH Y,ODAT Z,ALSMADI I,et al.Impact of Programming Features on Code Readability[J].Int.J.Softw.Eng.Its Appl.,2013,7(6):441-458.
[9]POSNETT D,HINDLE A,DEVANBU P.A simpler model of software readability[C]//Proceeding of the 8th Working Conference on Mining Software Repositories(MSR '11).2011:73.
[10]SCALABRINO S,LINARES-VASQUEZ M,POSHYVANYK D,et al.Improving code readability models with textual features[C]//2016 IEEE 24th International Conference on Program Comprehension (ICPC).2016:1-10.
[11]DORN J.A General Software Readability Model[D].Virginia:Univ.Virginia,Charlottesville,2012.
[12]CROOKES D.Generating readable software[J].Softw.Eng.J.,1987,2(3):64-70.
[13]BAECKER R.Enhancing program readability and comprehensibility with tools for program visualization[OL].https://dl.acm.org/doi/10.5555/55823.55858.
[14]BINKLEY D,DAVIS M,LAWRIE D,et al.To camelcase or under_score[C]//2009 IEEE 17th International Conference on Program Comprehension.2009:158-167.
[15]SHARIF B,MALETIC J I.An Eye Tracking Study on Camelcase and Under_score Identifier Styles[C]//2010 IEEE 18th International Conference on Program Comprehension.2010:196-205.
[16]BUSE R P L,ZIMMERMANN T.Information needs for software development analytics[C]//2012 34th International Conference on Software Engineering (ICSE).2012:987-996.
[17]AGGARWAL K K,SINGH Y,CHHABRA J K.An integrated measure of software maintainability[C]//Annual Reliability and Maintainability Symposium.2002:235-241.
[18]BÖRSTLER J,CASPERSEN M E,NORDSTRÖM M.Beauty and the beast:on the readability of object-oriented example programs[J].Softw.Qual.J.,2016,24(2):231-246.
[19]MI Q,KEUNG J,XIAO Y,et al.Improving code readability classification using convolutional neural networks[J].Inf.Softw.Technol.,2018,104:60-71.
[20]MAAS A L,HANNUN A Y,NG A Y.Rectifier Nonlinearities Improve Neural Network Acoustic Models[C]//Proc.30th Int.Conf.Mach.Learn..2013.
[21]KINGMA D P,BA J.Adam:A Method for Stochastic Optimization[J].arXiv:1412.6980v5.
[22]LIKERT R.A technique for the measurement of attitudes[OL].https://psycnet.apa.org/record/1933-01885-001.
[23]NEUBERT K,BRUNNER E.A studentized permutation test for the non-parametric Behrens-Fisher problem[J].Comput.Stat.Data Anal.,2007,51(10):5192-5204.
[24]WANG S,LIU T,TAN L.Automatically Learning Semantic Features for Defect Prediction[C]//2016 IEEE/ACM 38th International Confernce on Software Engineering.2016:297-308.

相关文章 15

[1]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2]	汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3]	徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[4]	王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[5]	郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[6]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[7]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[8]	胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[9]	程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[10]	侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[11]	周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[12]	苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[13]	刘伟业, 鲁慧民, 李玉鹏, 马宁. 指静脉识别技术研究综述 Survey on Finger Vein Recognition Research 计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056
[14]	孙福权, 崔志清, 邹彭, 张琨. 基于多尺度特征的脑肿瘤分割算法 Brain Tumor Segmentation Algorithm Based on Multi-scale Features 计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
[15]	康雁, 徐玉龙, 寇勇奇, 谢思宇, 杨学昆, 李浩. 基于Transformer和LSTM的药物相互作用预测 Drug-Drug Interaction Prediction Based on Transformer and LSTM 计算机科学, 2022, 49(6A): 17-21. https://doi.org/10.11896/jsjkx.210400150

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于多维度特征和混合神经网络的代码可读性评估方法

Code Readability Assessment Method Based on Multidimensional Features and Hybrid Neural Networks

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0