Computer Science ›› 2024, Vol. 51 ›› Issue (6): 23-33.doi: 10.11896/jsjkx.231100030

• Computer Software • Previous Articles     Next Articles

Nonsense Variable Names Detection Method Based on Lexical Features and Data Mining

JIANG Yanjie1, DONG Chunhao2, LIU Hui2   

  1. 1 School of Computer Science and Technology,Peking University,Beijing 100087,China
    2 School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China
  • Received:2023-11-03 Revised:2024-03-27 Online:2024-06-15 Published:2024-06-05
  • About author:JIANG Yanjie,born in 1993,research assistant,is a member of CCF(No.D7588G).Her main research interests include software refactoring and software testing.
  • Supported by:
    Key Program of the National Natural Science Foundation of China(62232003).

Abstract: Identifiers is an important part of code,and it is also one of the key elements for people to understand the semantics of code.Variables are widely used to represent objects in programs.Names of such variables could serve as a major clue to the responsibility of the variables if they are serious and properly named.However,unqualified variable names(e.g.,“a”,“var”) are constructed frequently by developers.Such nonsense variable names have a severe negative impact on the readability and maintai-nability of software applications.So,automated identification of bad smells is one of the hot topics in the field of software refacto-ring.To identify such nonsense names automatically,we conduct an empirical study to figure out the key features that could be exploited to distinguishing nonsense names from well-constructed meaningful ones.Results of the study suggest that nonsense variable names are often short and rarely contain meaningful words.To this end,in this paper,we propose a heuristics and data mining-based approach to identifying nonsense variable names.It first retrieves suspicious variable names based on lexical analysis.On the resulting suspicious names,it conducts an abbreviation expansion-based filtering to exclude such variable names that are carefully constructed to represent the abbreviations of meaningful words.Finally,it conducts data mining-based filtering to further exclude well-known symbols(e.g.“i”,“e”).Experimental results on open source datasets show that the proposed method has high accuracy.Its average precision and recall is 85% and 91.5%,respectively.

Key words: Software refactoring, Code quality, Data mining, Nonsense variable names, Lexical features

CLC Number: 

  • TP311
[1]MEYER B.Object-oriented software construction[M].Englewood Cliffs:Prentice hall,1997.
[2]LIU K,KIM D,BISSYANDÉ T F,et al.Learning to spot and refactor inconsistent method names[C]//2019 IEEE/ACM 41st International Conference on Software Engineering(ICSE).IEEE,2019:1-12.
[3]MARCUS A,POSHYVANYK D,FERENC R.Using the con-ceptual cohesion of classes for fault prediction in object-oriented systems[J].IEEE Transactions on Software Engineering,2008,34(2):287-300.
[4]ZHAO W,ZHANG L,LIU Y,et al.SNIAFL:Towards a static noninteractive approach to feature location[J].ACM Transactions on Software Engineering and Methodology(TOSEM),2006,15(2):195-226.
[5]SHTERN M,TZERPOS V.Clustering methodologies for software engineering[J].Advances in Software Engineering,2012(2012):792024.1-792024.18.
[6]EADDY M,AHO A V,ANTONIOL G,et al.Cerberus:Tracing requirements to source code using information retrieval,dynamic analysis,and program analysis[C]//2008 16th IEEE International Conference on Program Comprehension.IEEE,2008:53-62.
[7]LUCIA D.Information retrieval models for recovering traceabi-lity links between code and documentation[C]//Proceedings 2000 International Conference on Software Maintenance.IEEE,2000:40-49.
[8]FOWLER M.Refactoring:improving the design of existing code[M].Addison-Wesley Professional,2018.
[9]LI G,LIU H,NYAMAWE A S.A survey on renamings of software entities[J].ACM Computing Surveys(CSUR),2020,53(2):1-38.
[10]ARNAOUDOVA V,ESHKEVARI L M,DI PENTA M,et al.Repent:Analyzing the nature of identifier renamings[J].IEEE Transactions on Software Engineering,2014,40(5):502-532.
[11]FELDTHAUS A,MØLLER A.Semi-automatic rename refactoring for JavaScript[J].ACM SIGPLAN Notices,2013,48(10):323-338.
[12]THIES A,ROTH C.Recommending rename refactorings[C]//Proceedings of the 2nd International workshop on recommendation systems for software engineering.2010:1-5.
[13]LIU B,LIU H,NIU N,et al.Automated Software Entity Matching BetweenSuccessive Versions[C]//2023 38th IEEE/ACM International Conference on Automated Software Engineering(ASE).IEEE,2023:1615-1627.
[14]ZHANG M,HALL T,BADDOO N.Code bad smells:a review of current knowledge[J].Journal of Software Maintenance and Evolution:research and practice,2011,23(3):179-202.
[15]LEE S,KIM S,KIM J A,et al.Detecting Inconsistent Names of Source Code Using NLP[C]//Computer Applications for Database,Education,and Ubiquitous Computing:International Conferences,EL,DTA and UNESST 2012,Held as Part of the Future Generation Information Technology Conference,FGIT 2012.Springer Berlin Heidelberg,2012:111-115.
[16]ABEBE S L,HAIDUC S,TONELLA P,et al.Lexicon badsmells in software[C]//2009 16th Working Conference on Reverse Engineering.IEEE,2009:95-99.
[17]COHEN J.A coefficient of agreement for nominal scales[J].Edu-cational and psychological measurement,1960,20(1):37-46.
[18]GAN G,MA C,WU J.Data clustering:theory,algorithms,and applications[M].Society for Industrial and Applied Mathema-tics,2020.
[19]JIANG Y,LIU H,ZHANG L.Semantic relation based expansion of abbreviations[C]//Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.2019:131-141.
[20]ALLAMANIS M,BARR E T,BIRD C,et al.Learning natural coding conventions[C]//Proceedings of the 22nd ACM Sigsoft International Symposium on Foundations of Software Enginee-ring.2014:281-293.
[21]ALON U,ZILBERSTEIN M,LEVY O,et al.code2vec:Lear-ning distributed representations of code[J].Proceedings of the ACM on Programming Languages,2019,3(POPL):1-29.
[22]FAKHOURY S,MA Y,ARNAOUDOVA V,et al.The effect of poor source code lexicon and readability on developers’ cognitive load[C]//Proceedings of the 26th Conference on Program Comprehension.2018:286-296.
[23]LUNGU M,KURŠ J.On planning an evaluation of the impact of identifier names on the readability and quality of smalltalk programs[C]//2013 2nd International Workshop on User Evaluations for Software Engineering Researchers(USER).IEEE,2013:13-15.
[24]CAPRILE B,TONELLA P.Restructuring program identifiernames[C]//Proceedings 2000 International Conference on Software Maintenance.IEEE,2000:97-107.
[25]CAPRILE C,TONELLA P.Nomen est omen:Analyzing thelanguage of function identifiers[C]//Sixth Working Conference on Reverse Engineering(Cat.No.PR00303).IEEE,1999:112-122.
[26]ARNAOUDOVA V,ESHKEVARI L M,DI PENTA M,et al.Repent:Analyzing the nature of identifier renamings[J].IEEE Transactions on Software Engineering,2014,40(5):502-532.
[27]LAWRIE D,MORRELL C,FEILD H,et al.What’s in a Name? A Study of Identifiers[C]//14th IEEE International Conference on Program Comprehension(ICPC’06).IEEE,2006:3-12.
[28]AVIDAN E,FEITELSON D G.Effects of variable names oncomprehension:An empirical study[C]//2017 IEEE/ACM 25th International Conference on Program Comprehension(ICPC).IEEE,2017:55-65.
[29]SCHANKIN A,BERGER A,HOLT D V,et al.Descriptivecompound identifier names improve source code comprehension[C]//Proceedings of the 26th Conference on Program Comprehension.2018:31-40.
[30]HOFMEISTER J,SIEGMUND J,HOLT D V.Shorter identifier names take longer to comprehend[C]//2017 IEEE 24th International Conference on Software Analysis,Evolution and Reengineering(SANER).IEEE,2017:217-227.
[31]BENIAMINI G,GINGICHASHVILI S,ORBACH A K,et al.Meaningful identifier names:The case of single-letter variables[C]//2017 IEEE/ACM 25th International Conference on Program Comprehension(ICPC).IEEE,2017:45-54.
[32]PERUMA A,MKAOUER M W,DECKER M J,et al.An empirical investigation of how and why developers rename identifiers[C]//Proceedings of the 2nd International Workshop on Refactoring.2018:26-33.
[33]SWIDAN A,SEREBRENIK A,HERMANS F.How do Scratch programmers name variables and procedures?[C]//2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation(SCAM).IEEE,2017:51-60.
[34]BINKLEY D,HEARN M,LAWRIE D.Improving identifier informativeness using part of speech information[C]//Procee-dings of the 8th Working Conference on Mining Software Repositories.2011:203-206.
[35]ALLAMANIS M,BARR E T,BIRD C,et al.Learning natural coding conventions[C]//Proceedings of the 22nd ACM Sigsoft International Symposium on Foundations of Software Enginee-ring.2014:281-293.
[36]LIU H,LIU Q,LIU Y,et al.Identifying renaming opportunities by expanding conducted rename refactorings[J].IEEE Transactions on Software Engineering,2015,41(9):887-900.
[37]LIU H,LIU Q,STAICU C A,et al.Nomen est omen:Exploring and exploiting similarities between argument and parameter names[C]//Proceedings of the 38th International Conference on Software Engineering.2016:1063-1073.
[38]MALPOHL G,HUNT J J,TICHY W F.Renaming detection[J].Automated Software Engineering,2003,10:183-202.
[39]BUTLER S,WERMELINGER M,YU Y.Investigating naming convention adherence in Java references[C]//2015 IEEE International Conference on Software Maintenance and Evolution(ICSME).IEEE,2015:41-50.
[40]CAPRILE B,TONELLA P.Restructuring program identifiernames[C]//Proceedings 2000 International Conference on Software Maintenance.IEEE,2000:97-107.
[41]CORBO F,DEL GROSSO C,DI PENTA M.Smart formatter:Learning coding style from existing source code[C]//2007 IEEE International Conference on Software Maintenance.IEEE,2007:525-526.
[42]LAWRIE D,BINKLEY D.Expanding identifiers to normalizesource code vocabulary[C]//2011 27th IEEE International Conference on Software Maintenance(ICSM).IEEE,2011:113-122.
[43]LAWRIE D,BINKLEY D,MORRELL C.Normalizing sourcecode vocabulary[C]//2010 17th Working Conference on Reverse Engineering.IEEE,2010:3-12.
[44]ALATAWI A,XU W,YAN J.The expansion of source code abbreviations using a language model[C]//2018 IEEE 42nd An-nual Computer Software and Applications Conference(COMPSAC).IEEE,2018:370-375.
[45]ABEBE S L,HAIDUC S,TONELLA P,et al.Lexicon badsmells in software[C]//2009 16th Working Conference on Reverse Engineering.IEEE,2009:95-99.
[46]DEISSENBOECK F,PIZKA M.Concise and consistent naming[J].Software Quality Journal,2006,14:261-282.
[47]LAWRIE D,FEILD H,BINKLEY D.Syntactic identifier con-ciseness and consistency[C]//2006 Sixth IEEE International Workshop on Source Code Analysis and Manipulation.IEEE,2006:139-148.
[48]HØST E W,ØSTVOLD B M.Debugging method names[C]//European Conference on Object-Oriented Programming.Berlin:Springer,2009:294-317.
[49]DE LUCIA A,DI PENTA M,OLIVETO R.Improving source code lexicon via traceability and information retrieval[J].IEEE Transactions on Software Engineering,2010,37(2):205-227.
[50]SATRATZEMI M,STELIOS X,TSOMPANOUDI D.Distributed pair programming in higher education:A systematic literature review[J].Journal of Educational Computing Research,2023,61(3):546-577.
[51]PARSA S,ZAKERI-NASRABADI M,EKHTIARZADEH M,et al.Method name recommendation based on source code metrics[J].Journal of Computer Languages,2023,74:101177.
[52]DESAI R H,TADIMETI U,RICCARDI N.Proper and common names in the semantic system[J].Brain Structure and Function,2023,228(1):239-254.
[1] DONG Wanqing, ZHAO Zirong, LIAO Huimin, XIAO Hui, ZHANG Xiaoliang. Research and Implementation of Urban Traffic Accident Risk Prediction in Dynamic Road Network [J]. Computer Science, 2024, 51(6A): 230500118-10.
[2] XING Cunyuan, ZHANG Jie, JIN Ying. Discipline Competition Evaluation Model Based on Multi-attribute Comprehensive Evaluation [J]. Computer Science, 2024, 51(5): 21-26.
[3] BAO Kainan, ZHANG Junbo, SONG Li, LI Tianrui. ST-WaveMLP:Spatio-Temporal Global-aware Network for Traffic Flow Prediction [J]. Computer Science, 2024, 51(5): 27-34.
[4] CHEN Xinyang, CHEN Hanze, ZHOU Jiasheng, HUANG Jiaqing, YU Jiashuo, ZHU Longlong, ZHANG Dong. IntervalSketch:Approximate Statistical Method for Interval Items in Data Stream [J]. Computer Science, 2024, 51(4): 4-10.
[5] WANG Hancheng, DAI Haipeng, CHEN Zhipeng, CHEN Shusen, CHEN Guihai. Large-scale Network Community Detection Algorithm Based on MapReduce [J]. Computer Science, 2024, 51(4): 11-18.
[6] SHEN Zhehui, WANG Kailai, KONG Xiangjie. Exploring Station Spatio-Temporal Mobility Pattern:A Short and Long-term Traffic Prediction Framework [J]. Computer Science, 2023, 50(7): 98-106.
[7] ZHANG Jian, ZHANG Ye. College Students Employment Dynamic Prediction of Multi-feature Fusion Based on GRU-LSTM [J]. Computer Science, 2023, 50(6A): 220500056-6.
[8] ZHAO Xuejian, ZHAO Ke. Bio-inspired Frequent Itemset Mining Strategy Based on Genetic Algorithm [J]. Computer Science, 2023, 50(11A): 220700200-8.
[9] LI Rong-fan, ZHONG Ting, WU Jin, ZHOU Fan, KUANG Ping. Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation [J]. Computer Science, 2022, 49(8): 33-39.
[10] YAO Xiao-ming, DING Shi-chang, ZHAO Tao, HUANG Hong, LUO Jar-der, FU Xiao-ming. Big Data-driven Based Socioeconomic Status Analysis:A Survey [J]. Computer Science, 2022, 49(4): 80-87.
[11] KONG Yu-ting, TAN Fu-xiang, ZHAO Xin, ZHANG Zheng-hang, BAI Lu, QIAN Yu-rong. Review of K-means Algorithm Optimization Based on Differential Privacy [J]. Computer Science, 2022, 49(2): 162-173.
[12] XIONG Kai-fang, CHEN Hong-mei, WANG Li-zhen, XIAO Qing. Mining Spatial co-location Pattern with Dominant Feature [J]. Computer Science, 2022, 49(11A): 211000126-7.
[13] HUO Tian-yuan, GU Jing-jing. Dynamic and Static Relationship Fusion of Multi-source Health Perception Data for Disease Diagnosis [J]. Computer Science, 2022, 49(11A): 211100241-9.
[14] GUO Ya-lin, LI Xiao-chen, REN Zhi-lei, JIANG He. Study on Effectiveness of Quality Objectives and Non-quality Objectives for Automated Software Refactoring [J]. Computer Science, 2022, 49(11): 55-64.
[15] MA Dong, LI Xin-yuan, CHEN Hong-mei, XIAO Qing. Mining Spatial co-location Patterns with Star High Influence [J]. Computer Science, 2022, 49(1): 166-174.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!