计算机科学 ›› 2020, Vol. 47 ›› Issue (3): 1-4.doi: 10.11896/jsjkx.200200009

所属专题: 智能软件工程

• 智能软件工程 • 上一篇    下一篇

代码标识符归一化研究现状及发展趋势

张静宣1,江贺2   

  1. (南京航空航天大学计算机科学与技术学院 南京211106)1;
    (大连理工大学软件学院 辽宁 大连116600)2
  • 收稿日期:2019-12-05 出版日期:2020-03-15 发布日期:2020-03-30
  • 通讯作者: 江贺(jianghe@dlut.edu.cn)
  • 基金资助:
    国家重点研发项目(2018YFB1003900);国家自然科学基金(61902181)

Research Status and Development Trend of Identifier Normalization

ZHANG Jing-xuan1,JIANG He2   

  1. (College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China)1;
    (School of Software, Dalian University of Technology, Dalian, Liaoning 116600, China)2
  • Received:2019-12-05 Online:2020-03-15 Published:2020-03-30
  • About author:ZHANG Jing-xuan,born in 1988,Ph.D,assistant professor,is member of China Computer Federation.His main research interests include mining software repositories and so on. JIANG He,born in 1980,Ph.D,professor,Ph.D.supervisor,is Distinguished member of China Computer Federation.His main research interests include mining software repositories and intelligent software engineering.
  • Supported by:
    This work was supported by the National Key Research and Development Plan of China (2018YFB1003900) and National Natural Science Foundation of China (61902181).

摘要: 作为代码分析和理解的重要内容,代码标识符及其归一化是国际学术界的前沿热点研究领域。标识符归一化旨在将标识符解析成自然语言词汇,以提高代码的可理解性和可维护性。标识符归一化主要包括两个极具挑战性的步骤,分别为组合词拆分和缩写词扩充。文中详细介绍了代码标识符归一化的研究现状,并进行了深入分析,总结出现有工作的困难和不足。同时,为了解决标识符归一化面临的困难和挑战,对该领域可行的解决思路和未来的发展趋势进行了归纳和展望,希望引导更多的研究者投入到这个重要的研究领域。

关键词: 标识符归一化, 软件演化, 缩写词扩充, 源代码分析, 组合词拆分

Abstract: As an important research content of source code analysis and comprehension,identifier normalization is the leading field of the current research of software engineering.Identifier normalization aims to parse identifiers into natural language terms so as to improve the understandability and maintainability of source code.There are generally two challenging steps in identifier normalization:identifier splitting and identifier expansion.This paper introduced the research status of identifier normalization in detail,conducted an in-depth analysis of the research status,and summarized the difficulties and deficiencies of the existing work.At the same time,in order to solve the difficulties and challenges in identifier normalization,this paper summarized and prospected the feasible solutions and future development trends in this field,hoping to guide more researchers into this important research field.

Key words: Abbreviation expansion, Identifier normalization, Identifier splitting, Software evolution, Source code analysis

中图分类号: 

  • TP311.5
[1]ALLAMANIS M,BARR E T,DEVANBU P,et al.A Survey of Machine Learning for Big Code and Naturalness [J].ACM Computing Surveys,2018,51(4):1-37.
[2]AVIDAN E,FEITELSON D G.Effects of Variable Names on Comprehension:An Empirical Study[C]∥International Conference on Program Comprehension (ICPC 17).2017:55-65.
[3]JIANG Y J,LIU H,ZHU J Q,et al.Automatic and Accurate Expansion of Abbreviations in Parameters [J].IEEE Transactions on Software Engineering,2018,PP(99):1-1.
[4]KIM S,KIM D.Automatic Identifier Inconsistency Detection using Code Dictionary[J].Empirical Software Engineering,2016,12(2):565-604.
[5]JIN Z,LIU F,LI G.Program Comprehension:Present and Future [J].Journal of Software,2019,30(1):110-126.
[6]ZHANG J,ZHANG C,XUAN J F,et al.Recent Progress in Program Analysis [J].Journal of Software,2019,30(1):80-109.
[7]JIANG H,CHEN X,ZHANG J X,et al.Mining Software Repositories:Contributors and Hot Topics [J].Journal of Computer Research and Development,2016,53(12):2768-2782.
[8]HILL E,FRY Z P,BOYD H,et al.AMAP:Automatically Mining Abbreviation Expansions in Programs to Enhance Software Maintenance Tools[C]∥International Working Conference on Mining Software Repositories (MSR 08).2008:79-88.
[9]ZHANG J X,JIANG H,REN Z L,et al.Enriching API Documentation with Code Samples and Usage Scenarios from Crowd Knowledge [J].IEEE Transactions on Software Engineering,2018,PP(99):1-1.
[10]JIANG H,ZHANG J X,REN Z L,et al.An Unsupervised Approach for Discovering Relevant Tutorial Fragments for APIs[C]∥International Conference on Software Engineering (ICSE 17).2017:38-48.
[11]JIANG H,ZHANG J X,LI X C,et al.A More Accurate Model for Finding Tutorial Segments Explaining APIs[C]∥International Conference on Software Analysis,Evolution,and Reengineering (SANER 16).2016:157-167.
[12]CARVALHO N R,ALMEIDA J J,HENRIQUES P R,et al.From Source Code Identifiers to Natural Language Terms [J].Journal of Systems and Software,2015,100:117-128.
[13]NEWMAN C D,ALSUHAIBANI R S,COLLARD M L,et al.Lexical Categories for Source Code Identifiers[C]∥International Conference on Software Analysis,Evolution and Reengineering (SANER 17).2017:228-239.
[14]GUERROUJ L,GALINIER P,GUEHENEUC Y,et al.TRIS:A Fast and Accurate Identifiers Splitting and Expansion Algorithm[C]∥Working Conference on Reverse Engineering (WCRE 12).2012:103-112.
[15]HILL E,BINKLEY D,LAWRIE D,et al.An Empirical Study of Identifier Splitting Techniques [J].Empirical Software Engineering,2014,19:1754-1780.
[16]ZHANG B,HILL E,CLAUSE J.Towards Automatically Generating Descriptive Names for Unit Tests[C]∥International Conference on Automated Software Engineering (ASE 16).2016:625-636.
[17]ENSLEN E,HILL E,POLLOCK L L,et al.Mining Source Code to Automatically Split Identifiers for Software Analysis[C]∥International Working Conference on Mining Software Repositories (MSR 09).2009:71-80.
[18]GUERROUJ L,PENTA M D,ANTONIOL G,et al.TIDIER:An Identifier Splitting Approach Using Speech Recognition Techniques [J].Journal of Software:Evolution and Process,2013,25(6):575-599.
[19]MADANI N,GUERROUJ L,PENTA M D,et al.Recognizing Words from Source Code Identifiers using Speech Recognition Techniques[C]∥European Conference on Software Maintenance and Reengineering (CSMR 10).2010:68-77.
[20]BUTLER S,WERMELINGER M,YU Y,et al.Improving the Tokenisation of Identifier Names[C]∥European Conference on Object-oriented Programming (ECOOP 11).2011:130-154.
[21]SUREKA A.Source Code Identifier Splitting Using Yahoo Image and Web Search Engine[C]∥International Workshop on Software Mining.2012:1-8.
[22]LAWRIE D,BINKLEY D.Expanding Identifiers to Normalize Source Code Vocabulary[C]∥International Conference on Software Maintenance (ICSM 11).2011:113-122.
[23]LAWRIE D,BINKLEY D,MORRELL C.Normalizing Source Code Vocabulary[C]∥Working Conference on Reverse Engineering (WCRE 10).2010:3-12.
[24]CORAZZA A,MARTINO S D,MAGGIO V.LINSEN:An Efficient Approach to Split Identifiers and Expand Abbreviations[C]∥International Conference on Software Maintenance (ICSM 12).2012:233-242.
[25]ARNAOUDOVA V,ESHKEVARI L M,PENTA M D,et al.REPENT:Analyzing the Nature of Identifier Renamings [J].IEEE Transactions on Software Engineering,2014,40(5):502-532.
[26]TU Z,SU Z,DEVANBU P.On the Localness of Software[C]∥International Symposium on Foundations of Software Engineering (FSE 14).2014:269-280.
[27]HINDLE A,BARR E T,SU Z,et al.On the Naturalness of Software[C]∥International Conference on Software Engineering (ICSE 12).2012:837-847.
[28]LIN B,SCALABRINO S,MOCCI A,et al.Investigating the Use of Code Analysis and NLP to Promote a Consistent Usage of Identifiers[C]∥International Working Conference on Source Code Analysis and Manipulation (SCAM 17).2017:81-90.
[29]SCALABRINO S,BAVOTA G,VENDOME C,et al.Automatically Assessing Code Understandability:How Far Are We?[C]∥International Conference on Automated Software Engineering (ASE 17).2017:417-427.
[30]LUCIA D A,PENTA M D,OLIVETO R.Improving Source Code Lexicon via Traceability and Information Retrieval [J].IEEE Transactions on Software Engineering,2011,37(2):205-227.
[1] 张久杰, 陈超, 聂宏轩, 夏玉芹, 张丽萍, 马占飞.
基于类粒度的克隆代码群稳定性实证研究
Empirical Study on Stability of Clone Code Sets Based on Class Granularity
计算机科学, 2021, 48(5): 75-85. https://doi.org/10.11896/jsjkx.200900062
[2] 王继文, 吴毅坚, 彭鑫.
基于演化和语义特征的上帝类检测方法
Approach of God Class Detection Based on Evolutionary and Semantic Features
计算机科学, 2021, 48(12): 59-66. https://doi.org/10.11896/jsjkx.210100077
[3] 何鹏, 喻绿君.
面向群体协作开发的开源软件峭壁分析
Analysis of Open Source Software Cliff Walls for Group Collaborative Development
计算机科学, 2020, 47(6): 51-58. https://doi.org/10.11896/jsjkx.190300140
[4] 钟林辉, 扶丽娟, 叶海涛, 齐杰, 徐静.
软件演化历史的逆向工程生成方法研究
Study on Reverse Engineering Generation Method of Software Evolution History
计算机科学, 2020, 47(11A): 549-556. https://doi.org/10.11896/jsjkx.200200067
[5] 潘浩, 郑巍, 张紫枫, 芦超群.
软件网络分形结构特征研究
Study on Fractal Features of Software Networks
计算机科学, 2019, 46(2): 166-170. https://doi.org/10.11896/j.issn.1002-137X.2019.02.026
[6] 唐倩文, 陈良育.
基于复杂网络理论的Java开源系统演化分析
Analysis of Java Open Source System Evolution Based on Complex Network Theory
计算机科学, 2018, 45(8): 166-173. https://doi.org/10.11896/j.issn.1002-137X.2018.08.030
[7] 郑交交, 李彤, 林英, 谢仲文, 王晓芳, 成蕾, 刘妙.
构件系统演化一致性的判定方法
Judgement Method of Evolution Consistency of Component System
计算机科学, 2018, 45(10): 189-195. https://doi.org/10.11896/j.issn.1002-137X.2018.10.035
[8] 赵会群,黄榆涵.
软件模型代数性质的程序化验证
Program Verification of Software Model’s Algebraic Properties
计算机科学, 2017, 44(11): 240-245. https://doi.org/10.11896/j.issn.1002-137X.2017.11.036
[9] 王悦.
NS-3 802.11物理层源代码实现原理分析
Analyzing Source Code of 802.11 Physical Layer Implementation in NS-3
计算机科学, 2016, 43(Z6): 281-284. https://doi.org/10.11896/j.issn.1002-137X.2016.6A.067
[10] 钟林辉,李俊杰,夏鲸,薛良波.
基于多维属性的构件化软件演化相似性度量方法研究
Research on Evolution Similarity Measurement of Component-based Software Based on Multi-dimensional Evolution Properties
计算机科学, 2016, 43(Z11): 499-505. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.112
[11] 钱晔,李彤,郁涌,孙吉红,于倩,彭琳.
一种面向同步交互的软件演化过程建模方法
Approach to Modeling Software Evolution Process for Synchronous Interaction
计算机科学, 2016, 43(8): 154-158. https://doi.org/10.11896/j.issn.1002-137X.2016.08.032
[12] 韩俊明,王炜.
基于LDA的软件演化确认建模
Method of Modeling Software Evolution Confirmation Based on LDA
计算机科学, 2015, 42(Z11): 464-466.
[13] 郭丹丹,姜瑛.
一种基于源代码分析的程序变化影响路径集的生成方法
Generation Method of Path Set Affected by Program Change Based on Source Code Analysis
计算机科学, 2015, 42(12): 167-170.
[14] 刘阳,刘秋荣,刘辉.
函数抽取重构的自动检测方法
Automated Detection of Extract Method Refactorings
计算机科学, 2015, 42(12): 105-107.
[15] 于涵,王海,彭鑫,赵文耘.
基于3D动画的软件演化信息可视化
Software Evolution Visualization Based on 3D Animation
计算机科学, 2015, 42(12): 36-39.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!