代码标识符归一化研究现状及发展趋势

doi:10.11896/jsjkx.200200009

Abstract

Abstract: As an important research content of source code analysis and comprehension,identifier normalization is the leading field of the current research of software engineering.Identifier normalization aims to parse identifiers into natural language terms so as to improve the understandability and maintainability of source code.There are generally two challenging steps in identifier normalization:identifier splitting and identifier expansion.This paper introduced the research status of identifier normalization in detail,conducted an in-depth analysis of the research status,and summarized the difficulties and deficiencies of the existing work.At the same time,in order to solve the difficulties and challenges in identifier normalization,this paper summarized and prospected the feasible solutions and future development trends in this field,hoping to guide more researchers into this important research field.

Key words: Abbreviation expansion, Identifier normalization, Identifier splitting, Software evolution, Source code analysis

CLC Number:

TP311.5

ZHANG Jing-xuan, JIANG He. Research Status and Development Trend of Identifier Normalization[J].Computer Science, 2020, 47(3): 1-4.

References

[1]ALLAMANIS M,BARR E T,DEVANBU P,et al.A Survey of Machine Learning for Big Code and Naturalness [J].ACM Computing Surveys,2018,51(4):1-37.
[2]AVIDAN E,FEITELSON D G.Effects of Variable Names on Comprehension:An Empirical Study[C]∥International Conference on Program Comprehension (ICPC 17).2017:55-65.
[3]JIANG Y J,LIU H,ZHU J Q,et al.Automatic and Accurate Expansion of Abbreviations in Parameters [J].IEEE Transactions on Software Engineering,2018,PP(99):1-1.
[4]KIM S,KIM D.Automatic Identifier Inconsistency Detection using Code Dictionary[J].Empirical Software Engineering,2016,12(2):565-604.
[5]JIN Z,LIU F,LI G.Program Comprehension:Present and Future [J].Journal of Software,2019,30(1):110-126.
[6]ZHANG J,ZHANG C,XUAN J F,et al.Recent Progress in Program Analysis [J].Journal of Software,2019,30(1):80-109.
[7]JIANG H,CHEN X,ZHANG J X,et al.Mining Software Repositories:Contributors and Hot Topics [J].Journal of Computer Research and Development,2016,53(12):2768-2782.
[8]HILL E,FRY Z P,BOYD H,et al.AMAP:Automatically Mining Abbreviation Expansions in Programs to Enhance Software Maintenance Tools[C]∥International Working Conference on Mining Software Repositories (MSR 08).2008:79-88.
[9]ZHANG J X,JIANG H,REN Z L,et al.Enriching API Documentation with Code Samples and Usage Scenarios from Crowd Knowledge [J].IEEE Transactions on Software Engineering,2018,PP(99):1-1.
[10]JIANG H,ZHANG J X,REN Z L,et al.An Unsupervised Approach for Discovering Relevant Tutorial Fragments for APIs[C]∥International Conference on Software Engineering (ICSE 17).2017:38-48.
[11]JIANG H,ZHANG J X,LI X C,et al.A More Accurate Model for Finding Tutorial Segments Explaining APIs[C]∥International Conference on Software Analysis,Evolution,and Reengineering (SANER 16).2016:157-167.
[12]CARVALHO N R,ALMEIDA J J,HENRIQUES P R,et al.From Source Code Identifiers to Natural Language Terms [J].Journal of Systems and Software,2015,100:117-128.
[13]NEWMAN C D,ALSUHAIBANI R S,COLLARD M L,et al.Lexical Categories for Source Code Identifiers[C]∥International Conference on Software Analysis,Evolution and Reengineering (SANER 17).2017:228-239.
[14]GUERROUJ L,GALINIER P,GUEHENEUC Y,et al.TRIS:A Fast and Accurate Identifiers Splitting and Expansion Algorithm[C]∥Working Conference on Reverse Engineering (WCRE 12).2012:103-112.
[15]HILL E,BINKLEY D,LAWRIE D,et al.An Empirical Study of Identifier Splitting Techniques [J].Empirical Software Engineering,2014,19:1754-1780.
[16]ZHANG B,HILL E,CLAUSE J.Towards Automatically Generating Descriptive Names for Unit Tests[C]∥International Conference on Automated Software Engineering (ASE 16).2016:625-636.
[17]ENSLEN E,HILL E,POLLOCK L L,et al.Mining Source Code to Automatically Split Identifiers for Software Analysis[C]∥International Working Conference on Mining Software Repositories (MSR 09).2009:71-80.
[18]GUERROUJ L,PENTA M D,ANTONIOL G,et al.TIDIER:An Identifier Splitting Approach Using Speech Recognition Techniques [J].Journal of Software:Evolution and Process,2013,25(6):575-599.
[19]MADANI N,GUERROUJ L,PENTA M D,et al.Recognizing Words from Source Code Identifiers using Speech Recognition Techniques[C]∥European Conference on Software Maintenance and Reengineering (CSMR 10).2010:68-77.
[20]BUTLER S,WERMELINGER M,YU Y,et al.Improving the Tokenisation of Identifier Names[C]∥European Conference on Object-oriented Programming (ECOOP 11).2011:130-154.
[21]SUREKA A.Source Code Identifier Splitting Using Yahoo Image and Web Search Engine[C]∥International Workshop on Software Mining.2012:1-8.
[22]LAWRIE D,BINKLEY D.Expanding Identifiers to Normalize Source Code Vocabulary[C]∥International Conference on Software Maintenance (ICSM 11).2011:113-122.
[23]LAWRIE D,BINKLEY D,MORRELL C.Normalizing Source Code Vocabulary[C]∥Working Conference on Reverse Engineering (WCRE 10).2010:3-12.
[24]CORAZZA A,MARTINO S D,MAGGIO V.LINSEN:An Efficient Approach to Split Identifiers and Expand Abbreviations[C]∥International Conference on Software Maintenance (ICSM 12).2012:233-242.
[25]ARNAOUDOVA V,ESHKEVARI L M,PENTA M D,et al.REPENT:Analyzing the Nature of Identifier Renamings [J].IEEE Transactions on Software Engineering,2014,40(5):502-532.
[26]TU Z,SU Z,DEVANBU P.On the Localness of Software[C]∥International Symposium on Foundations of Software Engineering (FSE 14).2014:269-280.
[27]HINDLE A,BARR E T,SU Z,et al.On the Naturalness of Software[C]∥International Conference on Software Engineering (ICSE 12).2012:837-847.
[28]LIN B,SCALABRINO S,MOCCI A,et al.Investigating the Use of Code Analysis and NLP to Promote a Consistent Usage of Identifiers[C]∥International Working Conference on Source Code Analysis and Manipulation (SCAM 17).2017:81-90.
[29]SCALABRINO S,BAVOTA G,VENDOME C,et al.Automatically Assessing Code Understandability:How Far Are We?[C]∥International Conference on Automated Software Engineering (ASE 17).2017:417-427.
[30]LUCIA D A,PENTA M D,OLIVETO R.Improving Source Code Lexicon via Traceability and Information Retrieval [J].IEEE Transactions on Software Engineering,2011,37(2):205-227.

Related Articles 15

[1]	ZHANG Jiu-jie, CHEN Chao, NIE Hong-xuan, XIA Yu-qin, ZHANG Li-ping, MA Zhan-fei. Empirical Study on Stability of Clone Code Sets Based on Class Granularity [J]. Computer Science, 2021, 48(5): 75-85.
[2]	WANG Ji-wen, WU Yi-jian, PENG Xin. Approach of God Class Detection Based on Evolutionary and Semantic Features [J]. Computer Science, 2021, 48(12): 59-66.
[3]	HE Peng, YU Lv-jun. Analysis of Open Source Software Cliff Walls for Group Collaborative Development [J]. Computer Science, 2020, 47(6): 51-58.
[4]	ZHONG Lin-hui, FU Li-juan, YE Hai-tao, QI Jie, XU Jing. Study on Reverse Engineering Generation Method of Software Evolution History [J]. Computer Science, 2020, 47(11A): 549-556.
[5]	PAN Hao, ZHENG Wei, ZHANG Zi-feng, LU Chao-qun. Study on Fractal Features of Software Networks [J]. Computer Science, 2019, 46(2): 166-170.
[6]	TANG Qian-wen, CHEN Liang-yu. Analysis of Java Open Source System Evolution Based on Complex Network Theory [J]. Computer Science, 2018, 45(8): 166-173.
[7]	ZHENG Jiao-jiao, LI Tong, LIN Ying, XIE Zhong-wen, WANG Xiao-fang, CHENG Lei, LIU Miao. Judgement Method of Evolution Consistency of Component System [J]. Computer Science, 2018, 45(10): 189-195.
[8]	HE Yun, WANG Wei and LI Tong. Formal Method for Describing Software Evolution Ability Feature [J]. Computer Science, 2017, 44(7): 128-136.
[9]	ZHAO Hui-qun and HUANG Yu-han. Program Verification of Software Model’s Algebraic Properties [J]. Computer Science, 2017, 44(11): 240-245.
[10]	WANG Yue. Analyzing Source Code of 802.11 Physical Layer Implementation in NS-3 [J]. Computer Science, 2016, 43(Z6): 281-284.
[11]	ZHONG Lin-hui, LI Jun-jie, XIA Jin and XUE Liang-bo. Research on Evolution Similarity Measurement of Component-based Software Based on Multi-dimensional Evolution Properties [J]. Computer Science, 2016, 43(Z11): 499-505.
[12]	QIAN Ye, LI Tong, YU Yong, SUN Ji-hong, YU Qian and PENG Lin. Approach to Modeling Software Evolution Process for Synchronous Interaction [J]. Computer Science, 2016, 43(8): 154-158.
[13]	LIU Jin-zhuo, YU Qian, ZHAO Na, XIE Zhong-wen, YU Yong, HANG Fei-Lu and JIN Yun-zhi. Structure Verification Method for Software Evolution Process Based on Incidence Matrix [J]. Computer Science, 2015, 42(Z6): 519-524.
[14]	HAN Jun-ming and WANG Wei. Method of Modeling Software Evolution Confirmation Based on LDA [J]. Computer Science, 2015, 42(Z11): 464-466.
[15]	GUO Dan-dan and JIANG Ying. Generation Method of Path Set Affected by Program Change Based on Source Code Analysis [J]. Computer Science, 2015, 42(12): 167-170.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Research Status and Development Trend of Identifier Normalization

PDF (PC)