计算机科学 ›› 2020, Vol. 47 ›› Issue (3): 1-4.doi: 10.11896/jsjkx.200200009
所属专题: 智能软件工程
张静宣1,江贺2
ZHANG Jing-xuan1,JIANG He2
摘要: 作为代码分析和理解的重要内容,代码标识符及其归一化是国际学术界的前沿热点研究领域。标识符归一化旨在将标识符解析成自然语言词汇,以提高代码的可理解性和可维护性。标识符归一化主要包括两个极具挑战性的步骤,分别为组合词拆分和缩写词扩充。文中详细介绍了代码标识符归一化的研究现状,并进行了深入分析,总结出现有工作的困难和不足。同时,为了解决标识符归一化面临的困难和挑战,对该领域可行的解决思路和未来的发展趋势进行了归纳和展望,希望引导更多的研究者投入到这个重要的研究领域。
中图分类号:
[1]ALLAMANIS M,BARR E T,DEVANBU P,et al.A Survey of Machine Learning for Big Code and Naturalness [J].ACM Computing Surveys,2018,51(4):1-37. [2]AVIDAN E,FEITELSON D G.Effects of Variable Names on Comprehension:An Empirical Study[C]∥International Conference on Program Comprehension (ICPC 17).2017:55-65. [3]JIANG Y J,LIU H,ZHU J Q,et al.Automatic and Accurate Expansion of Abbreviations in Parameters [J].IEEE Transactions on Software Engineering,2018,PP(99):1-1. [4]KIM S,KIM D.Automatic Identifier Inconsistency Detection using Code Dictionary[J].Empirical Software Engineering,2016,12(2):565-604. [5]JIN Z,LIU F,LI G.Program Comprehension:Present and Future [J].Journal of Software,2019,30(1):110-126. [6]ZHANG J,ZHANG C,XUAN J F,et al.Recent Progress in Program Analysis [J].Journal of Software,2019,30(1):80-109. [7]JIANG H,CHEN X,ZHANG J X,et al.Mining Software Repositories:Contributors and Hot Topics [J].Journal of Computer Research and Development,2016,53(12):2768-2782. [8]HILL E,FRY Z P,BOYD H,et al.AMAP:Automatically Mining Abbreviation Expansions in Programs to Enhance Software Maintenance Tools[C]∥International Working Conference on Mining Software Repositories (MSR 08).2008:79-88. [9]ZHANG J X,JIANG H,REN Z L,et al.Enriching API Documentation with Code Samples and Usage Scenarios from Crowd Knowledge [J].IEEE Transactions on Software Engineering,2018,PP(99):1-1. [10]JIANG H,ZHANG J X,REN Z L,et al.An Unsupervised Approach for Discovering Relevant Tutorial Fragments for APIs[C]∥International Conference on Software Engineering (ICSE 17).2017:38-48. [11]JIANG H,ZHANG J X,LI X C,et al.A More Accurate Model for Finding Tutorial Segments Explaining APIs[C]∥International Conference on Software Analysis,Evolution,and Reengineering (SANER 16).2016:157-167. [12]CARVALHO N R,ALMEIDA J J,HENRIQUES P R,et al.From Source Code Identifiers to Natural Language Terms [J].Journal of Systems and Software,2015,100:117-128. [13]NEWMAN C D,ALSUHAIBANI R S,COLLARD M L,et al.Lexical Categories for Source Code Identifiers[C]∥International Conference on Software Analysis,Evolution and Reengineering (SANER 17).2017:228-239. [14]GUERROUJ L,GALINIER P,GUEHENEUC Y,et al.TRIS:A Fast and Accurate Identifiers Splitting and Expansion Algorithm[C]∥Working Conference on Reverse Engineering (WCRE 12).2012:103-112. [15]HILL E,BINKLEY D,LAWRIE D,et al.An Empirical Study of Identifier Splitting Techniques [J].Empirical Software Engineering,2014,19:1754-1780. [16]ZHANG B,HILL E,CLAUSE J.Towards Automatically Generating Descriptive Names for Unit Tests[C]∥International Conference on Automated Software Engineering (ASE 16).2016:625-636. [17]ENSLEN E,HILL E,POLLOCK L L,et al.Mining Source Code to Automatically Split Identifiers for Software Analysis[C]∥International Working Conference on Mining Software Repositories (MSR 09).2009:71-80. [18]GUERROUJ L,PENTA M D,ANTONIOL G,et al.TIDIER:An Identifier Splitting Approach Using Speech Recognition Techniques [J].Journal of Software:Evolution and Process,2013,25(6):575-599. [19]MADANI N,GUERROUJ L,PENTA M D,et al.Recognizing Words from Source Code Identifiers using Speech Recognition Techniques[C]∥European Conference on Software Maintenance and Reengineering (CSMR 10).2010:68-77. [20]BUTLER S,WERMELINGER M,YU Y,et al.Improving the Tokenisation of Identifier Names[C]∥European Conference on Object-oriented Programming (ECOOP 11).2011:130-154. [21]SUREKA A.Source Code Identifier Splitting Using Yahoo Image and Web Search Engine[C]∥International Workshop on Software Mining.2012:1-8. [22]LAWRIE D,BINKLEY D.Expanding Identifiers to Normalize Source Code Vocabulary[C]∥International Conference on Software Maintenance (ICSM 11).2011:113-122. [23]LAWRIE D,BINKLEY D,MORRELL C.Normalizing Source Code Vocabulary[C]∥Working Conference on Reverse Engineering (WCRE 10).2010:3-12. [24]CORAZZA A,MARTINO S D,MAGGIO V.LINSEN:An Efficient Approach to Split Identifiers and Expand Abbreviations[C]∥International Conference on Software Maintenance (ICSM 12).2012:233-242. [25]ARNAOUDOVA V,ESHKEVARI L M,PENTA M D,et al.REPENT:Analyzing the Nature of Identifier Renamings [J].IEEE Transactions on Software Engineering,2014,40(5):502-532. [26]TU Z,SU Z,DEVANBU P.On the Localness of Software[C]∥International Symposium on Foundations of Software Engineering (FSE 14).2014:269-280. [27]HINDLE A,BARR E T,SU Z,et al.On the Naturalness of Software[C]∥International Conference on Software Engineering (ICSE 12).2012:837-847. [28]LIN B,SCALABRINO S,MOCCI A,et al.Investigating the Use of Code Analysis and NLP to Promote a Consistent Usage of Identifiers[C]∥International Working Conference on Source Code Analysis and Manipulation (SCAM 17).2017:81-90. [29]SCALABRINO S,BAVOTA G,VENDOME C,et al.Automatically Assessing Code Understandability:How Far Are We?[C]∥International Conference on Automated Software Engineering (ASE 17).2017:417-427. [30]LUCIA D A,PENTA M D,OLIVETO R.Improving Source Code Lexicon via Traceability and Information Retrieval [J].IEEE Transactions on Software Engineering,2011,37(2):205-227. |
[1] | 张久杰, 陈超, 聂宏轩, 夏玉芹, 张丽萍, 马占飞. 基于类粒度的克隆代码群稳定性实证研究 Empirical Study on Stability of Clone Code Sets Based on Class Granularity 计算机科学, 2021, 48(5): 75-85. https://doi.org/10.11896/jsjkx.200900062 |
[2] | 王继文, 吴毅坚, 彭鑫. 基于演化和语义特征的上帝类检测方法 Approach of God Class Detection Based on Evolutionary and Semantic Features 计算机科学, 2021, 48(12): 59-66. https://doi.org/10.11896/jsjkx.210100077 |
[3] | 何鹏, 喻绿君. 面向群体协作开发的开源软件峭壁分析 Analysis of Open Source Software Cliff Walls for Group Collaborative Development 计算机科学, 2020, 47(6): 51-58. https://doi.org/10.11896/jsjkx.190300140 |
[4] | 钟林辉, 扶丽娟, 叶海涛, 齐杰, 徐静. 软件演化历史的逆向工程生成方法研究 Study on Reverse Engineering Generation Method of Software Evolution History 计算机科学, 2020, 47(11A): 549-556. https://doi.org/10.11896/jsjkx.200200067 |
[5] | 潘浩, 郑巍, 张紫枫, 芦超群. 软件网络分形结构特征研究 Study on Fractal Features of Software Networks 计算机科学, 2019, 46(2): 166-170. https://doi.org/10.11896/j.issn.1002-137X.2019.02.026 |
[6] | 唐倩文, 陈良育. 基于复杂网络理论的Java开源系统演化分析 Analysis of Java Open Source System Evolution Based on Complex Network Theory 计算机科学, 2018, 45(8): 166-173. https://doi.org/10.11896/j.issn.1002-137X.2018.08.030 |
[7] | 郑交交, 李彤, 林英, 谢仲文, 王晓芳, 成蕾, 刘妙. 构件系统演化一致性的判定方法 Judgement Method of Evolution Consistency of Component System 计算机科学, 2018, 45(10): 189-195. https://doi.org/10.11896/j.issn.1002-137X.2018.10.035 |
[8] | 赵会群,黄榆涵. 软件模型代数性质的程序化验证 Program Verification of Software Model’s Algebraic Properties 计算机科学, 2017, 44(11): 240-245. https://doi.org/10.11896/j.issn.1002-137X.2017.11.036 |
[9] | 王悦. NS-3 802.11物理层源代码实现原理分析 Analyzing Source Code of 802.11 Physical Layer Implementation in NS-3 计算机科学, 2016, 43(Z6): 281-284. https://doi.org/10.11896/j.issn.1002-137X.2016.6A.067 |
[10] | 钟林辉,李俊杰,夏鲸,薛良波. 基于多维属性的构件化软件演化相似性度量方法研究 Research on Evolution Similarity Measurement of Component-based Software Based on Multi-dimensional Evolution Properties 计算机科学, 2016, 43(Z11): 499-505. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.112 |
[11] | 钱晔,李彤,郁涌,孙吉红,于倩,彭琳. 一种面向同步交互的软件演化过程建模方法 Approach to Modeling Software Evolution Process for Synchronous Interaction 计算机科学, 2016, 43(8): 154-158. https://doi.org/10.11896/j.issn.1002-137X.2016.08.032 |
[12] | 韩俊明,王炜. 基于LDA的软件演化确认建模 Method of Modeling Software Evolution Confirmation Based on LDA 计算机科学, 2015, 42(Z11): 464-466. |
[13] | 郭丹丹,姜瑛. 一种基于源代码分析的程序变化影响路径集的生成方法 Generation Method of Path Set Affected by Program Change Based on Source Code Analysis 计算机科学, 2015, 42(12): 167-170. |
[14] | 刘阳,刘秋荣,刘辉. 函数抽取重构的自动检测方法 Automated Detection of Extract Method Refactorings 计算机科学, 2015, 42(12): 105-107. |
[15] | 于涵,王海,彭鑫,赵文耘. 基于3D动画的软件演化信息可视化 Software Evolution Visualization Based on 3D Animation 计算机科学, 2015, 42(12): 36-39. |
|