计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 94-99.doi: 10.11896/jsjkx.200800193

• 计算机软件 • 上一篇    下一篇

基于多维度特征和混合神经网络的代码可读性评估方法

米庆, 郭黎敏, 陈军成   

  1. 北京工业大学信息学部 北京100124
  • 收稿日期:2020-08-31 修回日期:2021-02-15 出版日期:2021-12-15 发布日期:2021-11-26
  • 通讯作者: 陈军成(juncheng@bjut.edu.cn)
  • 作者简介:miqing@bjut.edu.cn
  • 基金资助:
    国家自然科学基金(61702029);北京市自然科学基金(4192004);北京市教委项目(KM201810005023)

Code Readability Assessment Method Based on Multidimensional Features and Hybrid Neural Networks

MI Qing, GUO Li-min, CHEN Jun-cheng   

  1. Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China
  • Received:2020-08-31 Revised:2021-02-15 Online:2021-12-15 Published:2021-11-26
  • About author:MI Qing,born in 1987,Ph.D,lecturer,is a member of China Computer Federation.Her main research interests include code readability assessment,deep learning and empirical experiments.
    CHEN Jun-cheng,born in 1980,Ph.D,lecturer,is a member of ChinaCompu-ter Federation.His main research intere-sts include software testing,compiler optimization,machine learning and deep learning.
  • Supported by:
    National Natural Science Foundation of China(61702029),Natural Science Foundation of Beijing,China(4192004) and Project of Beijing Municipal Education Commission(KM201810005023).

摘要: 对代码可读性进行定量、准确的评估是有效保障软件质量、降低沟通成本以及维护成本、提高软件开发和演化效率的重要途径。然而,现有的针对代码可读性评估的研究方案大多是基于特征工程的,受到源代码表征方式、技术手段等多方面因素影响,其评估准确率并不高。为此,文中采用深度学习作为主要技术手段,提出了一种基于多维度特征和混合神经网络的代码可读性评估方法,通过整合并运用各种单一神经网络的优势,从字符级、词条级等不同维度挖掘源代码中蕴含的结构信息和语义信息,最终实现对代码可读性的量化评估。实验表明,该方法能够获得高达84.6%的评估准确率,比单独使用卷积神经网络提升了9.2%,比单独使用循环神经网络模型提升了6.5%,并且其表现优于现有的5个可读性模型,验证了所提出的多维度特征和混合神经网络的有效性。

关键词: 代码可读性, 代码表征, 深度学习, 代码分析, 软件质量保障

Abstract: Quantitative and accurate assessment of code readability is an important way to ensure software quality,reduce communication and maintenance costs,and improve the efficiency of software development and evolution.However,existing code readability studies depend mainly on the manual feature engineering method,which is likely to limit the model performance due to factors such as code representation strategies and technical means.Unlike prior studies,we propose a novel code readability assessment method based on multidimensional features and hybrid neural networks by using the technique of deep learning.Specifi-cally,we first propose a representation strategy with different granularity levels to transform source codes into matrices and vectors as the input to deep neural networks.We then build a CNN-BiGRU hybrid neural network that can automatically learn structural and semantic features from the source code.The experimental results show that our method is able to achieve an accuracy of 84.6%,which is 9.2% higher than CNN alone and 6.5% higher than BiGRU alone.Moreover,our method can outperform five state-of-the-art code readability models,which confirms the feasibility and effectiveness of multidimensional features and hybrid neural networks proposed in this study.

Key words: Code readability, Code representation, Deep learning, Code analysis, Software quality assurance

中图分类号: 

  • TP311
[1]HOOIMEIJER P,WEIMER W.Modeling bug report quality [C]//Proc. Twenty-Second IEEE/ACM Int.Conf.Autom.Softw.Eng.(ASE '07).2007:34.
[2]BUSE R P L,WEIMER W R.Learning a Metric for Code Rea- dability[J].IEEE Trans.Softw.Eng.,2010,3(4):546-558.
[3]SIVAPRAKASAM P.Improving Software Quality Through the Development of Code Readability[J].International Journal of Advanced Research in Computer and Communication Enginee-ring,2012,1(6):472-477.
[4]BUSE R P L,WEIMER W R.A metric for software readability [C]//Proceedings of the 2008 International Symposium on Software Testing and Analysis(ISSTA '08).2008:121.
[5]BOSWELL D,FOUCHER T.The Art of Readable Code:Simple and Practical Techniques for Writing Better Code[C]//O'Reilly Media,Inc..2011.
[6]FAKHOURY S,ROY D,HASSAN A,et al.Improving source code readability:theory and practice[C]//2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC).2019:2-12.
[7]SANTOS R M D,GEROSA M A.Impacts of coding practices on readability[C]//Proc.Int.Conf.Softw.Eng..2018:277-285.
[8]TASHTOUSH Y,ODAT Z,ALSMADI I,et al.Impact of Programming Features on Code Readability[J].Int.J.Softw.Eng.Its Appl.,2013,7(6):441-458.
[9]POSNETT D,HINDLE A,DEVANBU P.A simpler model of software readability[C]//Proceeding of the 8th Working Conference on Mining Software Repositories(MSR '11).2011:73.
[10]SCALABRINO S,LINARES-VASQUEZ M,POSHYVANYK D,et al.Improving code readability models with textual features[C]//2016 IEEE 24th International Conference on Program Comprehension (ICPC).2016:1-10.
[11]DORN J.A General Software Readability Model[D].Virginia:Univ.Virginia,Charlottesville,2012.
[12]CROOKES D.Generating readable software[J].Softw.Eng.J.,1987,2(3):64-70.
[13]BAECKER R.Enhancing program readability and comprehensibility with tools for program visualization[OL].https://dl.acm.org/doi/10.5555/55823.55858.
[14]BINKLEY D,DAVIS M,LAWRIE D,et al.To camelcase or under_score[C]//2009 IEEE 17th International Conference on Program Comprehension.2009:158-167.
[15]SHARIF B,MALETIC J I.An Eye Tracking Study on Camelcase and Under_score Identifier Styles[C]//2010 IEEE 18th International Conference on Program Comprehension.2010:196-205.
[16]BUSE R P L,ZIMMERMANN T.Information needs for software development analytics[C]//2012 34th International Conference on Software Engineering (ICSE).2012:987-996.
[17]AGGARWAL K K,SINGH Y,CHHABRA J K.An integrated measure of software maintainability[C]//Annual Reliability and Maintainability Symposium.2002:235-241.
[18]BÖRSTLER J,CASPERSEN M E,NORDSTRÖM M.Beauty and the beast:on the readability of object-oriented example programs[J].Softw.Qual.J.,2016,24(2):231-246.
[19]MI Q,KEUNG J,XIAO Y,et al.Improving code readability classification using convolutional neural networks[J].Inf.Softw.Technol.,2018,104:60-71.
[20]MAAS A L,HANNUN A Y,NG A Y.Rectifier Nonlinearities Improve Neural Network Acoustic Models[C]//Proc.30th Int.Conf.Mach.Learn..2013.
[21]KINGMA D P,BA J.Adam:A Method for Stochastic Optimization[J].arXiv:1412.6980v5.
[22]LIKERT R.A technique for the measurement of attitudes[OL].https://psycnet.apa.org/record/1933-01885-001.
[23]NEUBERT K,BRUNNER E.A studentized permutation test for the non-parametric Behrens-Fisher problem[J].Comput.Stat.Data Anal.,2007,51(10):5192-5204.
[24]WANG S,LIU T,TAN L.Automatically Learning Semantic Features for Defect Prediction[C]//2016 IEEE/ACM 38th International Confernce on Software Engineering.2016:297-308.
[1] 董晓梅, 王蕊, 邹欣开. 面向推荐应用的差分隐私方案综述[J]. 计算机科学, 2021, 48(9): 21-35.
[2] 周新民, 胡宜桂, 刘文洁, 孙荣俊. 基于多模态多层级数据融合方法的城市功能识别研究[J]. 计算机科学, 2021, 48(9): 50-58.
[3] 钱梦薇, 过弋. 融合偏置深度学习的距离分解Top-N推荐算法[J]. 计算机科学, 2021, 48(9): 103-109.
[4] 徐涛, 田崇阳, 刘才华. 基于深度学习的人群异常行为检测综述[J]. 计算机科学, 2021, 48(9): 125-134.
[5] 张新峰, 宋博. 一种基于改进三元组损失和特征融合的行人重识别方法[J]. 计算机科学, 2021, 48(9): 146-152.
[6] 林椹尠, 张梦凯, 吴成茂, 郑兴宁. 利用生成对抗网络的人脸图像分步补全法[J]. 计算机科学, 2021, 48(9): 174-180.
[7] 黄晓生, 徐静. 基于PCANet的非下采样剪切波域多聚焦图像融合[J]. 计算机科学, 2021, 48(9): 181-186.
[8] 田野, 陈宏巍, 王法胜, 陈兴文. 室内移动机器人的SLAM算法综述[J]. 计算机科学, 2021, 48(9): 223-234.
[9] 谢良旭, 李峰, 谢建平, 许晓军. 基于融合神经网络模型的药物分子性质预测[J]. 计算机科学, 2021, 48(9): 251-256.
[10] 冯霞, 胡志毅, 刘才华. 跨模态检索研究进展综述[J]. 计算机科学, 2021, 48(8): 13-23.
[11] 王立梅, 朱旭光, 汪德嘉, 张勇, 邢春晓. 基于深度学习的民事案件判决结果分类方法研究[J]. 计算机科学, 2021, 48(8): 80-85.
[12] 郭琳, 李晨, 陈晨, 赵睿, 范仕霖, 徐星雨. 基于通道注意递归残差网络的图像超分辨率重建[J]. 计算机科学, 2021, 48(8): 139-144.
[13] 刘帅, 芮挺, 胡育成, 杨成松, 王东. 基于深度学习SuperGlue算法的单目视觉里程计[J]. 计算机科学, 2021, 48(8): 157-161.
[14] 王施云, 杨帆. 基于U-Net特征融合优化策略的遥感影像语义分割方法[J]. 计算机科学, 2021, 48(8): 162-168.
[15] 田嵩旺, 蔺素珍, 杨博. 基于多判别器的多波段图像自监督融合方法[J]. 计算机科学, 2021, 48(8): 185-190.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 李贵,陈盛红,李征宇,韩子阳,孙平. 融合用户时效偏好的推荐算法[J]. 计算机科学, 2014, 41(Z6): 394 -399 .
[2] 高蕾,胡玉鹏. WSN中基于最小延时的数据汇集树构建与传输调度算法[J]. 计算机科学, 2017, 44(Z6): 300 -304 .
[3] 朱焱. 万维网资源质量模式挖掘技术分析[J]. 计算机科学, 2010, 37(8): 201 -207 .
[4] 黄美蓉, 欧博, 何思源. 一种基于特征提取的访问控制方法[J]. 计算机科学, 2019, 46(2): 109 -114 .
[5] 汪晓妍, 刘琪琪, 黄晓洁, 姜娓娓, 夏明. 基于空间对齐和轮廓匹配的颈动脉多对比MRI三维配准方法[J]. 计算机科学, 2019, 46(5): 241 -246 .
[6] 韩佳佳, 张德平. 考虑软件运行的软-硬件退化系统剩余寿命估计[J]. 计算机科学, 2019, 46(6A): 511 -517 .
[7] 韩慧健, 宋馨芳, 张慧. 一种城市需水量预测的模糊认知图方法[J]. 计算机科学, 2019, 46(11A): 47 -51 .
[8] 杨力, 李欣宇, 石怀峰, 潘成胜. 空间信息网络任务智能识别方法[J]. 计算机科学, 2020, 47(4): 262 -269 .
[9] 孟利民, 王锟, 郑增乾, 蒋维. 基于粒子群算法的D2D内容边缘缓存架构策略[J]. 计算机科学, 2020, 47(11A): 345 -348 .
[10] 潘孝勤, 芦天亮, 杜彦辉, 仝鑫. 基于深度学习的语音合成与转换技术综述[J]. 计算机科学, 2021, 48(8): 200 -208 .