计算机科学 ›› 2018, Vol. 45 ›› Issue (11A): 497-500.

• 软件工程与数据库技术 • 上一篇    下一篇

基于代价敏感集成分类器的长方法检测

刘丽倩, 董东   

  1. 河北师范大学数学与信息科学学院 石家庄050024
  • 出版日期:2019-02-26 发布日期:2019-02-26
  • 通讯作者: 董 东(1971-),男,硕士,副教授,CCF会员,主要研究方向为经验软件工程,E-mail:dongdong@hebtu.edu.cn
  • 作者简介:刘丽倩(1991-),女,硕士生,主要研究方向为经验软件工程,E-mail:1437784143@qq.com

Long Method Detection Based on Cost-sensitive Integrated Classifier

LIU Li-qian, DONG Dong   

  1. College of Mathematics and Information Science,Hebei Normal University,Shijiazhuang 050024,China
  • Online:2019-02-26 Published:2019-02-26

摘要: 长方法(Long Method)是由于一个方法太长而需要重构的软件设计的问题。为了提高传统机器学习方法对长方法的识别率,针对代码坏味数据不平衡的特性,提出代价敏感集成分类器算法。以传统决策树算法为基础,利用欠采样策略对样本进行重采样,进而生成多个平衡的子集,并将这些子集训练生成多个相同的基分类器,然后将这些基分类器组合形成一个集成分类器。最后在集成分类器中引入由认知复杂度决定的误分类代价,使得分类器向准确分类少数类倾斜。与传统机器学习算法相比,此方法对长方法检测结果的查准率和查全率均有一定提升。

关键词: 长方法, 代价敏感, 代码坏味, 认知复杂度

Abstract: Long method is a software design problem that requires refactoring because it is too long.In order to improve the detection rate of traditional machine learning approaches on long method,a cost-sensitive integrated classifier algorithm was proposed from the viewpoint of unbalanced sample data of code smell.Based on the traditional decision tree algorithm,the under-sampling startegy is used for resampling,then a plurality of balanced subsets are generated.These subsets are trained to generate a plurality of same base classifiers.Finally,the mistaken classification cost determined by the cognitive complexity is complemented to the integrated classifier.The cost makes the classifier inclined to the accuracy rate of the minority categories.Compared with the traditional machine learning algorithm,this method has improved the precision and recall for detection result of long methods.

Key words: Code smell, Cognitive complexity, Cost-sensitive, Long method

中图分类号: 

  • TP311
[1]FOWLER M.Refactoring:Improving the Design of Existing Code [M].Lecture Notes in Computer Science,1999:256.
[2]FONTANA F A,ZANONI M,MARINO A.Comparing and Experimenting Machine Learning Techniques for Code Smell Detection[J].Empirical Software Engineering,2016,21(3):1143-1191.
[3]RAO A A,REDDY K N.Detecting Bad Smells in Object Oriented Design Using Design Change Propagation Probability Matrix[M].Lecture Notes in Engineering & Computer Science,2008.
[4]MOHA N,GUEHENEUC Y G,DUCHIEN L,et al.DECOR:A Method for the Specification and Detection of Code and Design Smells[J].IEEE Transactions on Software Engineering,2010,36(1):20-36.
[5]KOSBA E,ABDELMOEZ W,IESA A F.Risk-Based Code Smells Detection Tool[C]∥International conference on Computing Technology and Information Management.2014.
[6]刘秋荣.面向代码坏味检测的阈值动态优化方法[D].北京:北京理工大学,2016.
[7]KREIMER J.Adaptive Detection of Design Flaws[J].Electronic Notes in Theoretical Computer Science,2005,141(4):117-136.
[8]MAIGA A,ALI N,BHATTACHARYA N,et al.Support Vector Machines for Anti-pattern Detection[C]∥IEEE/ACM International Conference on Automated Software Engineering.ACM,2012:278-281.
[9]KHOMH F,VAUCHER S,SAHRAOUI H.BDTEX:A GQM-based Bayesian Approach for the Detection of Antipatterns[J].Journal of Systems & Software,2011,84(4):559-572.
[10]KHOMH F,SAHRAOUI H.A Bayesian Approach for the Detection of Code and Design Smells[C]∥International Conference on Quality Software.IEEE,2010:305-314.
[11]MALHOTRA R,KHANNA M.An empirical study for software change prediction using imbalanced data[J].Empirical Software Engineering,2017,22(6):1-46.
[12]ELKAN C.The Foundations of Cost-Sensitive Learning[C]∥Seventeenth International Joint Conference on Artificial Intelligence.2001:973-978.
[13]BAHNSEN A C,STOJANOVIC A,AOUADA D,et al.Cost Sensitive Credit Card Fraud Detection Using Bayes Minimum Risk[C]∥International Conference on Machine Learning and Applications.IEEE,2014:333-338.
[14]陶新民,刘福荣,童智靖,等.不均衡数据下基于SVM的故障检测新算法[J].振动与冲击,2010,29(12):8-12.
[15]KAI M T.Inducing Cost-sensitive Trees via Instance Weighting[C]∥European Symposium on Principles of Data Mining and Knowledge Discovery.Berlin Heidelberg:Springer-Verlag,1998:139-147.
[16]LIU X Y,ZHOU Z H.The Influence of Class Imbalance on Cost-Sensitive Learning:An Empirical Study[C]∥International Conference on Data Mining.IEEE Computer Society,2006:970-974.
[17]FELDMAN J.An Algebra of Human Concept Learning[J].Journal of Mathematical Psychology,2006,50(4):339-368.
[18]CHHABRA J K.Code Cognitive Complexity:A New Measure[M].Lecture Notes in Engineering & Computer Science,2011,2191(1).
[19]TAHIR M A,KITTLER J,MIKOLAJCZYK K,et al.A Multi-ple Expert Approach to the Class Imbalance Problem Using Inverse Random under Sampling[C]∥International Workshop on Multiple Classifier Systems.Berlin Heidelberg:Springer-Verlag,2009:82-91.
[20]PHUA C,ALAHAKOON D,LEE V.Minority Report in Fraud Detection:Classification of Skewed Data[J].Acm Sigkdd Explorations Newsletter,2004,6(1):50-59.
[21]LAURIKKALA J.Improving Identification of Difficult Small Classes by Balancing Class Distribution[C]∥Conference on AI in Medicine in Europe:Artificial Intelligence Medicine.Berlin Heidelberg:Springer-Verlag,2001:63-66.
[1] 李京泰, 王晓丹.
基于代价敏感激活函数XGBoost的不平衡数据分类方法
XGBoost for Imbalanced Data Based on Cost-sensitive Activation Function
计算机科学, 2022, 49(5): 135-143. https://doi.org/10.11896/jsjkx.210400064
[2] 黄颖琦, 陈红梅.
基于代价敏感卷积神经网络的非平衡问题混合方法
Cost-sensitive Convolutional Neural Network Based Hybrid Method for Imbalanced Data Classification
计算机科学, 2021, 48(9): 77-85. https://doi.org/10.11896/jsjkx.200900013
[3] 王继文, 吴毅坚, 彭鑫.
基于演化和语义特征的上帝类检测方法
Approach of God Class Detection Based on Evolutionary and Semantic Features
计算机科学, 2021, 48(12): 59-66. https://doi.org/10.11896/jsjkx.210100077
[4] 鲁淑霞, 张振莲.
基于最优间隔的AdaBoostv算法的非平衡数据分类
Imbalanced Data Classification of AdaBoostv Algorithm Based on Optimum Margin
计算机科学, 2021, 48(11): 184-191. https://doi.org/10.11896/jsjkx.200900107
[5] 吴崇明, 王晓丹, 薛爱军, 来杰.
基于ECOC的多类代价敏感分类方法
Multiclass Cost-sensitive Classification Based on Error Correcting Output Codes
计算机科学, 2020, 47(6A): 89-94. https://doi.org/10.11896/JsJkx.190500089
[6] 吴雨茜, 王俊丽, 杨丽, 余淼淼.
代价敏感深度学习方法研究综述
Survey on Cost-sensitive Deep Learning Methods
计算机科学, 2019, 46(5): 1-12. https://doi.org/10.11896/j.issn.1002-137X.2019.05.001
[7] 邱少健, 蔡子仪, 陆璐.
基于卷积神经网络的代价敏感软件缺陷预测模型
Cost-sensitive Convolutional Neural Network Model for Software Defect Prediction
计算机科学, 2019, 46(11): 156-160. https://doi.org/10.11896/jsjkx.191100502C
[8] 才子昕, 王馨月, 徐剑, 景丽萍.
样本自适应的不平衡分类器
Sample Adaptive Classifier for Imbalanced Data
计算机科学, 2019, 46(1): 94-99. https://doi.org/10.11896/j.issn.1002-137X.2019.01.014
[9] 杨新, 李天瑞, 刘盾, 方宇, 王宁.
基于决策粗糙集的广义序贯三支决策方法
Generalized Sequential Three-way Decisions Approach Based on Decision-theoretic Rough Sets
计算机科学, 2018, 45(10): 1-5. https://doi.org/10.11896/j.issn.1002-137X.2018.10.001
[10] 邢颖, 李德玉, 王素格.
代价敏感的序贯三支决策方法
Cost-sensitive Sequential Three-way Decision Making Method
计算机科学, 2018, 45(10): 6-10. https://doi.org/10.11896/j.issn.1002-137X.2018.10.002
[11] 师彦文,王宏杰.
基于新型不纯度度量的代价敏感随机森林分类器
Cost-sensitive Random Forest Classifier with New Impurity Measurement
计算机科学, 2017, 44(Z11): 98-101. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.019
[12] 杨杰,燕雪峰,张德平.
基于Boosting的代价敏感软件缺陷预测方法
Cost-sensitive Software Defect Prediction Method Based on Boosting
计算机科学, 2017, 44(8): 176-180. https://doi.org/10.11896/j.issn.1002-137X.2017.08.031
[13] 邢胜,王晓兰,赵士欣,赵彦霞.
改进的加权极速学习机
Improved Weighted Extreme Learning Machine
计算机科学, 2017, 44(4): 275-280. https://doi.org/10.11896/j.issn.1002-137X.2017.04.057
[14] 刘偲,秦亮曦.
模糊决策粗糙集代价敏感属性约简研究
Study on Cost Sensitive Attribute Reduction for Fuzzy Decision Theoretic Rough Sets
计算机科学, 2016, 43(Z11): 67-72. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.015
[15] 安春霖,陆慧娟,魏莎莎,杨小兵.
嵌入代价敏感的极限学习机相异性集成的基因表达数据分类
Dissimilarity Based Ensemble of Extreme Learning Machine with Cost-sensitive for Gene Expression Data Classification
计算机科学, 2014, 41(12): 211-215. https://doi.org/10.11896/j.issn.1002-137X.2014.12.046
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!