计算机科学 ›› 2018, Vol. 45 ›› Issue (11A): 497-500.
刘丽倩, 董东
LIU Li-qian, DONG Dong
摘要: 长方法(Long Method)是由于一个方法太长而需要重构的软件设计的问题。为了提高传统机器学习方法对长方法的识别率,针对代码坏味数据不平衡的特性,提出代价敏感集成分类器算法。以传统决策树算法为基础,利用欠采样策略对样本进行重采样,进而生成多个平衡的子集,并将这些子集训练生成多个相同的基分类器,然后将这些基分类器组合形成一个集成分类器。最后在集成分类器中引入由认知复杂度决定的误分类代价,使得分类器向准确分类少数类倾斜。与传统机器学习算法相比,此方法对长方法检测结果的查准率和查全率均有一定提升。
中图分类号:
[1]FOWLER M.Refactoring:Improving the Design of Existing Code [M].Lecture Notes in Computer Science,1999:256. [2]FONTANA F A,ZANONI M,MARINO A.Comparing and Experimenting Machine Learning Techniques for Code Smell Detection[J].Empirical Software Engineering,2016,21(3):1143-1191. [3]RAO A A,REDDY K N.Detecting Bad Smells in Object Oriented Design Using Design Change Propagation Probability Matrix[M].Lecture Notes in Engineering & Computer Science,2008. [4]MOHA N,GUEHENEUC Y G,DUCHIEN L,et al.DECOR:A Method for the Specification and Detection of Code and Design Smells[J].IEEE Transactions on Software Engineering,2010,36(1):20-36. [5]KOSBA E,ABDELMOEZ W,IESA A F.Risk-Based Code Smells Detection Tool[C]∥International conference on Computing Technology and Information Management.2014. [6]刘秋荣.面向代码坏味检测的阈值动态优化方法[D].北京:北京理工大学,2016. [7]KREIMER J.Adaptive Detection of Design Flaws[J].Electronic Notes in Theoretical Computer Science,2005,141(4):117-136. [8]MAIGA A,ALI N,BHATTACHARYA N,et al.Support Vector Machines for Anti-pattern Detection[C]∥IEEE/ACM International Conference on Automated Software Engineering.ACM,2012:278-281. [9]KHOMH F,VAUCHER S,SAHRAOUI H.BDTEX:A GQM-based Bayesian Approach for the Detection of Antipatterns[J].Journal of Systems & Software,2011,84(4):559-572. [10]KHOMH F,SAHRAOUI H.A Bayesian Approach for the Detection of Code and Design Smells[C]∥International Conference on Quality Software.IEEE,2010:305-314. [11]MALHOTRA R,KHANNA M.An empirical study for software change prediction using imbalanced data[J].Empirical Software Engineering,2017,22(6):1-46. [12]ELKAN C.The Foundations of Cost-Sensitive Learning[C]∥Seventeenth International Joint Conference on Artificial Intelligence.2001:973-978. [13]BAHNSEN A C,STOJANOVIC A,AOUADA D,et al.Cost Sensitive Credit Card Fraud Detection Using Bayes Minimum Risk[C]∥International Conference on Machine Learning and Applications.IEEE,2014:333-338. [14]陶新民,刘福荣,童智靖,等.不均衡数据下基于SVM的故障检测新算法[J].振动与冲击,2010,29(12):8-12. [15]KAI M T.Inducing Cost-sensitive Trees via Instance Weighting[C]∥European Symposium on Principles of Data Mining and Knowledge Discovery.Berlin Heidelberg:Springer-Verlag,1998:139-147. [16]LIU X Y,ZHOU Z H.The Influence of Class Imbalance on Cost-Sensitive Learning:An Empirical Study[C]∥International Conference on Data Mining.IEEE Computer Society,2006:970-974. [17]FELDMAN J.An Algebra of Human Concept Learning[J].Journal of Mathematical Psychology,2006,50(4):339-368. [18]CHHABRA J K.Code Cognitive Complexity:A New Measure[M].Lecture Notes in Engineering & Computer Science,2011,2191(1). [19]TAHIR M A,KITTLER J,MIKOLAJCZYK K,et al.A Multi-ple Expert Approach to the Class Imbalance Problem Using Inverse Random under Sampling[C]∥International Workshop on Multiple Classifier Systems.Berlin Heidelberg:Springer-Verlag,2009:82-91. [20]PHUA C,ALAHAKOON D,LEE V.Minority Report in Fraud Detection:Classification of Skewed Data[J].Acm Sigkdd Explorations Newsletter,2004,6(1):50-59. [21]LAURIKKALA J.Improving Identification of Difficult Small Classes by Balancing Class Distribution[C]∥Conference on AI in Medicine in Europe:Artificial Intelligence Medicine.Berlin Heidelberg:Springer-Verlag,2001:63-66. |
[1] | 李京泰, 王晓丹. 基于代价敏感激活函数XGBoost的不平衡数据分类方法 XGBoost for Imbalanced Data Based on Cost-sensitive Activation Function 计算机科学, 2022, 49(5): 135-143. https://doi.org/10.11896/jsjkx.210400064 |
[2] | 黄颖琦, 陈红梅. 基于代价敏感卷积神经网络的非平衡问题混合方法 Cost-sensitive Convolutional Neural Network Based Hybrid Method for Imbalanced Data Classification 计算机科学, 2021, 48(9): 77-85. https://doi.org/10.11896/jsjkx.200900013 |
[3] | 王继文, 吴毅坚, 彭鑫. 基于演化和语义特征的上帝类检测方法 Approach of God Class Detection Based on Evolutionary and Semantic Features 计算机科学, 2021, 48(12): 59-66. https://doi.org/10.11896/jsjkx.210100077 |
[4] | 鲁淑霞, 张振莲. 基于最优间隔的AdaBoostv算法的非平衡数据分类 Imbalanced Data Classification of AdaBoostv Algorithm Based on Optimum Margin 计算机科学, 2021, 48(11): 184-191. https://doi.org/10.11896/jsjkx.200900107 |
[5] | 吴崇明, 王晓丹, 薛爱军, 来杰. 基于ECOC的多类代价敏感分类方法 Multiclass Cost-sensitive Classification Based on Error Correcting Output Codes 计算机科学, 2020, 47(6A): 89-94. https://doi.org/10.11896/JsJkx.190500089 |
[6] | 吴雨茜, 王俊丽, 杨丽, 余淼淼. 代价敏感深度学习方法研究综述 Survey on Cost-sensitive Deep Learning Methods 计算机科学, 2019, 46(5): 1-12. https://doi.org/10.11896/j.issn.1002-137X.2019.05.001 |
[7] | 邱少健, 蔡子仪, 陆璐. 基于卷积神经网络的代价敏感软件缺陷预测模型 Cost-sensitive Convolutional Neural Network Model for Software Defect Prediction 计算机科学, 2019, 46(11): 156-160. https://doi.org/10.11896/jsjkx.191100502C |
[8] | 才子昕, 王馨月, 徐剑, 景丽萍. 样本自适应的不平衡分类器 Sample Adaptive Classifier for Imbalanced Data 计算机科学, 2019, 46(1): 94-99. https://doi.org/10.11896/j.issn.1002-137X.2019.01.014 |
[9] | 杨新, 李天瑞, 刘盾, 方宇, 王宁. 基于决策粗糙集的广义序贯三支决策方法 Generalized Sequential Three-way Decisions Approach Based on Decision-theoretic Rough Sets 计算机科学, 2018, 45(10): 1-5. https://doi.org/10.11896/j.issn.1002-137X.2018.10.001 |
[10] | 邢颖, 李德玉, 王素格. 代价敏感的序贯三支决策方法 Cost-sensitive Sequential Three-way Decision Making Method 计算机科学, 2018, 45(10): 6-10. https://doi.org/10.11896/j.issn.1002-137X.2018.10.002 |
[11] | 师彦文,王宏杰. 基于新型不纯度度量的代价敏感随机森林分类器 Cost-sensitive Random Forest Classifier with New Impurity Measurement 计算机科学, 2017, 44(Z11): 98-101. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.019 |
[12] | 杨杰,燕雪峰,张德平. 基于Boosting的代价敏感软件缺陷预测方法 Cost-sensitive Software Defect Prediction Method Based on Boosting 计算机科学, 2017, 44(8): 176-180. https://doi.org/10.11896/j.issn.1002-137X.2017.08.031 |
[13] | 邢胜,王晓兰,赵士欣,赵彦霞. 改进的加权极速学习机 Improved Weighted Extreme Learning Machine 计算机科学, 2017, 44(4): 275-280. https://doi.org/10.11896/j.issn.1002-137X.2017.04.057 |
[14] | 刘偲,秦亮曦. 模糊决策粗糙集代价敏感属性约简研究 Study on Cost Sensitive Attribute Reduction for Fuzzy Decision Theoretic Rough Sets 计算机科学, 2016, 43(Z11): 67-72. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.015 |
[15] | 安春霖,陆慧娟,魏莎莎,杨小兵. 嵌入代价敏感的极限学习机相异性集成的基因表达数据分类 Dissimilarity Based Ensemble of Extreme Learning Machine with Cost-sensitive for Gene Expression Data Classification 计算机科学, 2014, 41(12): 211-215. https://doi.org/10.11896/j.issn.1002-137X.2014.12.046 |
|