计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 59-66.doi: 10.11896/jsjkx.210100077

• 计算机软件 • 上一篇    下一篇

基于演化和语义特征的上帝类检测方法

王继文, 吴毅坚, 彭鑫   

  1. 复旦大学软件学院 上海200438
    上海市数据科学重点实验室 上海200438
  • 收稿日期:2021-01-10 修回日期:2021-03-21 出版日期:2021-12-15 发布日期:2021-11-26
  • 通讯作者: 吴毅坚(wuyijian@fudan.edu.cn)
  • 作者简介:18212010032@fudan.edu.cn
  • 基金资助:
    国家重点研发计划(2017YFB1002000);上海市科技发展基金项目(18DZ1112100,18DZ1112102)

Approach of God Class Detection Based on Evolutionary and Semantic Features

WANG Ji-wen, WU Yi-jian, PENG Xin   

  1. Software School,Fudan University,Shanghai 200438,China
    Shanghai Key Laboratory of Data Science,Shanghai 200438,China
  • Received:2021-01-10 Revised:2021-03-21 Online:2021-12-15 Published:2021-11-26
  • About author:WANG Ji-wen,born in 1997,postgra-duate.His main research interests include software design analysis and software evolution analysis.
    WU Yi-jian,born in 1979,Ph.D,asso-ciate professor,is a member of China Computer Federation.His main research interests include big code analysis,software evolution analysis and code clone detection and management.
  • Supported by:
    National Key R & D Program of China(2017YFB1002000) and Shanghai Science and Technology Development Funds(18DZ1112100,18DZ1112102).

摘要: 随着软件开发迭代速度的加快,开发人员在编码过程中往往由于交付压力等种种原因违反软件设计的基本原则,造成代码坏味,进而影响软件质量。上帝类是最常见的代码坏味之一,指承担了太多职责的类。上帝类违反“高内聚、低耦合”的设计原则,损害软件系统的质量,会影响代码的可理解性和可维护性。因此,文中提出一种新的上帝类检测方法。首先抽取实际项目中方法在演化、语义等维度上的特征;然后对演化、语义特征进行融合,并基于融合后的结果重新聚类,将彼此关系紧密的方法重新划归到新的类簇中;通过分析实际项目中各个类的成员方法在新的聚类结果中的分布情况,计算类的内聚度,从而找出内聚度低的类作为上帝类检测结果。实验表明,所提方法优于目前主流的上帝类检测方法。与基于度量的传统检测方法相比,该方法在查全率、查准率上均提升超过20个百分点;与基于机器学习的检测方法相比,该方法尽管查全率略低,但查准率、F1值均有显著提升。

关键词: 上帝类, 代码坏味, 软件演化, 内聚度

Abstract: With the acceleration of software development iterations,developers often violate the basic principles of software design due to various reasons such as delivery pressure,resulting in code smells and affecting software quality.God class is one of the most common code smells,referring to classes that have taken on too many responsibilities.God class violates the design principle of “high cohesion and low coupling”,damages the quality of the software system,and affects the understandability and maintainability of the code.Therefore,a new method of god class detection is proposed.It extracts the evolutionary and semantic features of the actual project,then merges the evolution and semantic features.Based on the merged features,it re-clusters all the methods for the projects.By analyzing the distribution of the member methods of each class in the actual project in the new clustering result,it calculates the cohesion of the class,and finds the class with low cohesion as the God class detection result.Experiments show that this method is superior to the current mainstream God class detection methods.Compared with traditional mea-surement-based detection methods,the recall and precision rates of the proposed method are increased by more than 20%.Compared with detection methods based on machine learning,although the recall rate of the proposed method is slightly lower,but the precision rate and F1 value are significantly improved.

Key words: God class, Code smell, Software evolution, Cohesion

中图分类号: 

  • TP311.5
[1]FOWLER M.Refactoring:improving the design of existing code[M].Addison-Wesley Longman Publishing Co.Inc.,1999.
[2]CHATZIGEORGIOU A,MANAKOS A.Investigating the evolution of Code Smells in object-oriented systems[J].Innovations in Systems and Software Engineering,2014,10(1):3-18.
[3]HAMZA H,COUNSELL S,HALL T,et al.Code Smell eradication and associated refactoring[M].World Scientific and Engineering Academy and Society(WSEAS),2008.
[4]LANZA M,MARINESCU R,DUCASSE S.Object-oriented Metrics in Practice:Using Software Metrics to Characterize,Evaluate,and Improve the Design of Object-oriented Systems[M].Berlin:Springer-Verlag,2006.
[5]FOWLER M.Trans.Refactoring:Improving the Design of Existing Code(2nd ed)[M].Beijing:Posts and Telecom Press,2015.
[6]LE Q,MIKOLOV T.Distributed representations of sentences and documents[C]//International Conference on Machine Learning.PMLR,2014:1188-1196.
[7]HINTON G E,SALAKHUTDINOV R R.Reducing the dimensionality of data with neural networks[J].Science,2006,313:504-507.
[8]JAINA K.Data clustering:50 years beyond K-means[J].Pattern Recognition Letters,2010,31(8):651-666.
[9]ETZKORN L H,GHOLSTON S E,FORTUNE J L,et al.A comparison of cohesion metrics for object-oriented systems[J].Information & Software Technology,2004,46(10):677-687.
[10]PALOMBA F,NUCCI D D,TUFANO M,et al.Landfill:An Open Dataset of Code Smells with Public Evaluation[C]//Mi-ning Software Repositories.IEEE,2015:482-485.
[11]TSANTALIS N,CHATZIGEORGIOU A.Identification of Extract Method Refactoring Opportunities[C]//European Confe-rence on Software Maintenance and Reengineering.IEEE Computer Society,2009:119-128.
[12]REDDY K R,RAO A A.Dependency oriented complexity me- trics to detect rippling related design defects[J].ACM Sigsoft Software Engineering Notes,2009,34(4):1-7.
[13]PALOMBA F,BAVOTA G,PENTA M D,et al.Detecting bad smells in source code using change history information[C]//International Conference on Automated Software Engineering.ACM,2013:268-278.
[14]KHOMH F,VAUCHER S,GUEHENEUC Y,et al.BDTEX:A GQM-based Bayesian approach for the detection of antipatterns[J].J.Syst.Softw.,2011,84(4):559-572.
[15]FONTANA F A,ZANONI M,MARINO A,et al.Code smell detection:Towards a machine learning-based approach[C]//2013 IEEE International Conference on Software Maintenance.IEEE,2013:396-399.
[16]BU Y F,LIU H,LI G J.A God class detection method based on deep learning[J].Journal of Software,2019,30(5):161-176.
[17]ZHANG X F,ZHU C.Empirical study of code smell impact on software evolution[J].Journal of Software,2019,30(5):1422-1437.
[18]WU J,HOLT R,HASSAN A.Exploring software evolution using spectrographs[C]//Proceeding of the 11th Working Conference on Reverse Engineering.IEEE Press,2004:80-89.
[19]WU J,SPITZER C W,HASSAN A E,et al.Evolution spectrographs:Visualizing punctuated change in software evolution[C]//Proceeding of the 7th International Workshop on Principles of Software Evolution.ACM Press,2004:57-66.
[20]GALL H,JAZAYERI M,RIVA C.Visualizing software release histories:The use of color and third dimension[C]//Proceeding of the International Conference on Software Maintenance.IEEE Press,1999:99-108.
[21]LANZA M.The evolution matrix:Recovering software evolu- tion using software visualization techniques[C]//Proceeding of the 1st Workshop on Principles of Software Evolution.New York:ACM Press,2001:37-42.
[22]GÎRBA T,DUCASSE S.Modeling history to analyze software evolution[J].Journal of Software Maintenance and Evolution:Research and Practice,2006,18(3):207-236.
[23]ROBBES R,LANZA M.A change-based approach to software evolution[J].Electronic Notes in Theoretical Computer Science,2007,166:93-109.
[24]KOUROSHFAR E.Studying the effect of co-change dispersion on software quality[C]//International Conference on Software Engineering,2013:1450-1452.
[25]GUO X,XIANG Y,CHEN Q,et al.LDA-based online topic detection using tensor factorization[J].J. Inf. Sci.,2013,39(4):459-469.
[26]DEY A,JENAMANI M,THAKKAR J J.Lexical TF-IDF:An n-gram feature space for cross-domain classification of sentiment reviews[C]//International Conference on Pattern Recognition and Machine Intelligence.Cham:Springer,2017:380-386.
[27]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013.
[28]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems.2013:3111-3119.
[29]LAHITANI A R,PERMANASARI A E,SETIAWAN N A. Cosine similarity to determine similarity measure:Study case in online essay assessment[C]//2016 4th International Conference on Cyber and IT Service Management.IEEE,2016:1-6.
[30]PALOMBA F,BAVOTA G,PENTA M,et al.On the diffuseness and the impact on maintainability of code smells:a large scale empirical investigation[J].Empir Software Eng.,2018,23:1188-1221.
[31]FOKAEFS M,TSANTALIS N,STROULIA E,et al.JDeodo- rant:identification and application of extract class refactorings[C]//2011 33rd International Conference on Software Enginee-ring(ICSE).IEEE,2011:1037-1039.
[1] 张久杰, 陈超, 聂宏轩, 夏玉芹, 张丽萍, 马占飞. 基于类粒度的克隆代码群稳定性实证研究[J]. 计算机科学, 2021, 48(5): 75-85.
[2] 何鹏, 喻绿君. 面向群体协作开发的开源软件峭壁分析[J]. 计算机科学, 2020, 47(6): 51-58.
[3] 张静宣, 江贺. 代码标识符归一化研究现状及发展趋势[J]. 计算机科学, 2020, 47(3): 1-4.
[4] 钟林辉, 扶丽娟, 叶海涛, 齐杰, 徐静. 软件演化历史的逆向工程生成方法研究[J]. 计算机科学, 2020, 47(11A): 549-556.
[5] 潘浩, 郑巍, 张紫枫, 芦超群. 软件网络分形结构特征研究[J]. 计算机科学, 2019, 46(2): 166-170.
[6] 唐倩文, 陈良育. 基于复杂网络理论的Java开源系统演化分析[J]. 计算机科学, 2018, 45(8): 166-173.
[7] 郁湧,康庆怡,陈长赓,阚世林,骆永军. 基于内聚度和耦合度的二分K均值方法[J]. 计算机科学, 2018, 45(6A): 460-464.
[8] 刘丽倩, 董东. 基于代价敏感集成分类器的长方法检测[J]. 计算机科学, 2018, 45(11A): 497-500.
[9] 郑交交, 李彤, 林英, 谢仲文, 王晓芳, 成蕾, 刘妙. 构件系统演化一致性的判定方法[J]. 计算机科学, 2018, 45(10): 189-195.
[10] 赵会群,黄榆涵. 软件模型代数性质的程序化验证[J]. 计算机科学, 2017, 44(11): 240-245.
[11] 钟林辉,李俊杰,夏鲸,薛良波. 基于多维属性的构件化软件演化相似性度量方法研究[J]. 计算机科学, 2016, 43(Z11): 499-505.
[12] 钱晔,李彤,郁涌,孙吉红,于倩,彭琳. 一种面向同步交互的软件演化过程建模方法[J]. 计算机科学, 2016, 43(8): 154-158.
[13] 韩俊明,王炜. 基于LDA的软件演化确认建模[J]. 计算机科学, 2015, 42(Z11): 464-466.
[14] 刘阳,刘秋荣,刘辉. 函数抽取重构的自动检测方法[J]. 计算机科学, 2015, 42(12): 105-107.
[15] 于涵,王海,彭鑫,赵文耘. 基于3D动画的软件演化信息可视化[J]. 计算机科学, 2015, 42(12): 36-39.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 王楠,欧阳丹彤,孙善武. 基于本体的分层抽象模型[J]. 计算机科学, 2011, 38(2): 184 -186 .
[2] 苗德成,奚建清. 一种时态数据形式语言模型[J]. 计算机科学, 2012, 39(4): 172 -176 .
[3] 岳晓威, 彭莎, 秦克云. 基于面向对象(属性)概念格的形式背景属性约简方法[J]. 计算机科学, 2020, 47(6A): 436 -439 .
[4] 陈恒, 王维美, 李冠宇, 史一民. 四元数关系旋转的知识图谱补全模型[J]. 计算机科学, 2021, 48(5): 225 -231 .
[5] 潘孝勤, 芦天亮, 杜彦辉, 仝鑫. 基于深度学习的语音合成与转换技术综述[J]. 计算机科学, 2021, 48(8): 200 -208 .
[6] 王俊, 王修来, 庞威, 赵鸿飞. 面向科技前瞻预测的大数据治理研究[J]. 计算机科学, 2021, 48(9): 36 -42 .
[7] 余力, 杜启翰, 岳博妍, 向君瑶, 徐冠宇, 冷友方. 基于强化学习的推荐研究综述[J]. 计算机科学, 2021, 48(10): 1 -18 .
[8] 王梓强, 胡晓光, 李晓筱, 杜卓群. 移动机器人全局路径规划算法综述[J]. 计算机科学, 2021, 48(10): 19 -29 .
[9] 高洪皓, 郑子彬, 殷昱煜, 丁勇. 区块链技术专题序言[J]. 计算机科学, 2021, 48(11): 1 -3 .
[10] 毛瀚宇, 聂铁铮, 申德荣, 于戈, 徐石成, 何光宇. 区块链即服务平台关键技术及发展综述[J]. 计算机科学, 2021, 48(11): 4 -11 .