计算机科学 ›› 2017, Vol. 44 ›› Issue (Z6): 495-498.doi: 10.11896/j.issn.1002-137X.2017.6A.110

• 大数据与数据挖掘 • 上一篇    下一篇

基于潜在语义分析的Large Class检测

马赛,董东   

  1. 河北师范大学数学与信息科学学院 石家庄050024,河北师范大学数学与信息科学学院 石家庄050024
  • 出版日期:2017-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受河北省自然科学基金(F2013205192)资助

Detection of Large Class Based on Latent Semantic Analysis

MA Sai and DONG Dong   

  • Online:2017-12-01 Published:2018-12-01

摘要: Large Class(上帝类)是面向对象设计中的一种设计瑕疵。为了弥补传统Large Class检测中使用面向代码结构度量的不足,提出基于潜在语义分析的平均概念相似性度量。根据源代码中提取的标识符和注释形成词-文档矩阵,在潜在语义空间下计算方法间的相似度,进而得到类的平均概念相似性;并将概念性度量与代码圈复杂度结合以对Large Class进行识别。在开源的Code Smell检测数据集Landfill上进行实验,结果表明,与传统上使用结构信息对Large Class进行检测相比,使用该方法时检测的准确率和召回率均得到了一定提升。

关键词: Large Class,潜在语义分析,代码瑕疵,圈复杂度

Abstract: Large Class is a kind of object-oriented design flaws.In order to overcome the insufficience of the traditional Large Class detecting which only considers the metrics of source code structure,this paper proposesd the mean concept similarity metric based on latent semantic analysis.A term-document matrix is formed from the identifiers and comments extracted from source code firstly.The similarity between methods and the mean concept similarity of a class are computed in the space of LSA.The conceptual measure is combined with the cyclomatic complexity of the source code to identify large classes.Experiments on the open source Landfill data set show that the detection accuracy and recall rate of this method all increase comparing to the traditional approaches through structure information of Large Class testing.

Key words: Large Class,Latent semantic analysis,Code smell,Cyclomatic complexity

[1] FOWLER M.Refactoring:Improving the Design of ExistingCode[C]∥Xp Universe & First Agile Universe Conference on Extreme Programming & Agile Methods-Xp/agile Universe.Springer-Verlag,1999:256.
[2] TRAVASSOS G,SHULL F,FREDERICKS M,et al.Detecting defects in object-oriented designs:using reading techniques to increase software quality[J].Acm Sigplan Notices,1999,34(10):47-56.
[3] CHIDAMBER S R,KEMERER C F.A metrics suite for object oriented design[J].IEEE Transactions on Software Enginee-ring,1994,20(6):476-493.
[4] CARNEIRO G D F,SILVA M,MARA L,et al.Identifying Code Smells with Multiple Concern Views[C]∥Brazilian Symposium on Software Engineering.IEEE Computer Society,2010:128-137.
[5] BAKER B S.On finding duplication and near-duplication in large software systems[C]∥Proceedings of Working Conference on Reverse Engineering,1995.1995:86-95.
[6] MARINESCU R.Measurement and Quality in Object-Oriented Design[J].Proceedings IEEE International Conference on Software Maintenance,2005,2005:701-704.
[7] TSANTALIS N,Chatzigeorgiou A.Identification of Extract Me-thod Refactoring Opportunities[C]∥European Conference on Software Maintenance and Reengineering.IEEE Computer Society,2009:119-128.
[8] REDDY K R,RAO A A.Dependency oriented complexity metrics to detect rippling related design defects[J].Acm Sigsoft Software Engineering Notes,2009,34(4):1-7.
[9] OLBRICH S,CRUZES D S,BASILI V,et al.The evolution and impact of code smells:A case study of two open source systems[C]∥International Symposium on Empirical Software Engineering and Measurement.IEEE Computer Society,2009:390-400.
[10] JIANG D,MA P,SU X,et al.Detection and Refactoring of BAD Smell Caused by Large Scale[J].International Journal of Software Engineering & Applications,2013,4(5).
[11] SIMON F,FRANK S,LEWERENTZ C.Metrics based refacto-ring[C]∥European Conference on Software Maintenance and Reengineering.IEEE,2001:30-38.
[12] MARCUS A,POSHYVANYK D.The Conceptual Cohesion of Classes[C]∥2013 IEEE International Conference on Software Maintenance.IEEE Computer Society,2005:133-142.
[13] DEERWESTER S,DUMAIS S T,FURNAS G W,et al.Indexing by latent semantic analysis[J].Journal of the American Society for Information Science,1990,41(6):391-407.
[14] 王宁.浅析潜在语义分析的原理及其应用[J].新世纪图书馆,2007(5):67-70.
[15] KUHN A,DUCASSE S,GI^RBA T.Semantic clustering:Identifying topics in source code[J].Information & Software Techno-logy,2007,49(3):230-243.
[16] FOLTZ P W,KINTSCH W,LANDAUER T K,et al.Running head:Textual coherence using latent semantic analysis,The Measurement of Textual Coherence with Latent Semantic Analysis.https://core.ac.uk/display/20832868.
[17] MALETIC J I,MARCUS A.Using latent semantic analysis to identify similarities in source code to support program understanding[C]∥IEEE International Conference on TOOLS with Artificial Intelligence.IEEE Computer Society,2000:46-53.
[18] MCCABE T J.A Complexity Measure[J].IEEE Transactions on Software Engineering,1976,SE-2(SE-2):308-320.
[19] PALOMBA F,NUCCI D D,TUFANO M,et al.Landfill:AnOpen Dataset of Code Smells with PublicEvaluation[C]∥Mining Software Repositories.IEEE,2015:482-48.
[20] 廖兴,尹俊文,蔡放.基于Java语言的抽象语法树的创建与遍历[J].长沙大学学报,2004,18(4):50-53.
[21] WILD F.An LSA Package for R.http://npn.wu-wien.ac.at/research/putblications/bb7s/pdf.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!