计算机科学 ›› 2020, Vol. 47 ›› Issue (5): 103-109.doi: 10.11896/jsjkx.180601099
向伟1, 王新维2
XIANG Wei1, WANG Xin-wei2
摘要: 不平衡数据分类是一种重要的数据分类问题。对于不平衡数据中规模较小的类,传统的分类算法的分类效果较差。对此,提出一种多类邻域三支决策模型的不平衡数据分类算法。首先,将传统的三支决策在混合数据和多个类的情形下进行推广,提出了混合数据的多类邻域三支决策模型;然后,在该模型中给出一种自适应代价函数的设定方法,并基于该方法提出了多类邻域三支决策模型的不平衡数据分类算法。仿真实验的结果表明,所提出的分类算法对于不平衡数据具有更好的分类性能。
中图分类号:
[1]ZHANG S,SADAOUI S,MOUHOUB M.An empirical analysis of imbalanced data classification[J].Computer & Information Science,2015,8(1):151-162. [2]HE H B,GARCIA E.Learning from imbalanced data[J].IEEE Transactions on Knowledge & Data Engineering,2009,21(9):1263-1284. [3]HE H L,ZHANG W Y,ZHANG S.A novel ensemble method for credit scoring:Adaption of different imbalance ratios[J].Expert Systems with Applications,2018,98(15):105-117. [4]RIVERA W A.Noise reduction a priori synthetic over-sampling for class imbalanced data sets[J].Information Sciences,2017,408:146-161. [5]DOUZAS G,BACAO F,LAST F.Improving imbalanced lear-ning through a heuristic oversampling method based on k-means and SMOTE[J].Information Sciences,2018,465:1-20. [6]CORDÓN I,GARCÍA S,FERNÁNDEZ A,et al.Imbalance:Oversampling algorithms for imbalanced classification in R[J].Knowledge-Based Systems,2018,161:329-341. [7]ZHU Y J,WANG Z,GAO D Q.Gravitational fixed radius nearest neighbor for imbalanced problem[J].Knowledge-Based Systems,2015,90:224-238. [8]WU G,CHANG E.KBA:Kernel boundary alignment conside-ring imbalanced data distribution[J].IEEE Transactions on Knowledge & Data Engineering,2005,17(6):786-795. [9]GUPTA D,RICHHARIYA B,BORAH P.A fuzzy twin support vector machine based on information entropy for class imbalance learning[J].Neural Computing & Applications,2018(3):1-12. [10]DÍEZ-PASTOR J F,RODRÍGUEZ J J,GARCÍA-OSORIO C,et al.Random Balance:Ensembles of variable priors classifiers for imbalanced data[J].Knowledge-Based Systems,2015,85(2/3):96-111. [11]KHAN S H,HAYAT M,BENNAMOUN M,et al.Cost-sensitive learning of deep feature representations from imbalanced data[J].IEEE Transactions on Neural Networks & Learning Systems,2018,29(8):3573-3587. [12]CAO C J,WANG Z.IMCStacking:Cost-sensitive stacking lear-ning with feature inverse mapping for imbalanced problems[J].Knowledge-Based Systems,2018,150(15):27-37. [13]YAO Y Y.Three-way decisions with probabilistic rough sets[J].Information Sciences,2010,180(3):341-353. [14]ZHOU B.Multi-class decision-theoretic rough sets[J].International Journal of Approximate Reasoning,2014,55(1):211-224. [15]LIANG D C,LIU D,KOBINA A.Three-way group decisions with decision-theoretic rough sets[J].Information Sciences,2016,345:46-64. [16]CHEN Y F,YUE X D,FUJITA H,et al.Three-way decision support for diagnosis on focal liver lesions[J].Knowledge-Based Systems,2017,127:85-99. [17]LIU D,LI T R,LI H X.A multiple-category classification approach with decision-theoretic rough sets[J].Fundamenta Informaticae,2012,115(2/3):173-188. [18]LI W W,HUANG Z Q,JIA X Y,et al.Neighborhood based decision-theoretic rough set models[J].International Journal of Approximate Reasoning,2016,69:1-17. [19]HU Q H,YU D R,LIU J F,et al.Neighborhood rough set based heterogeneous feature subset selection[J].Information Sciences,2008,178(18):3577-3594. [20]HU Q H,YU D R,XIE Z X.Neighborhood classifiers[J].Expert Systems with Applications,2008,34(2):866-876. [21]KUBAT M,HOLTE R,MATWIN S.Learning when negative examples abound[C]//European Conference on Machine Lear-ning.Springer Berlin Heidelberg,1997:146-153. [22]DAVIS J,GOADRICH M.The relationship between Precision-Recall and ROC curves[C]//Proceedings of the,International Conference on Machine Learning(ICML 2006).New York,USA:ACM Press,2006:233-240. [23]FAWCETT T.An introduction to ROC analysis[J].PatternRecognition Letters,2006,27(8):861-874. [24]JIANG S Y,XIE Z Q,YU W.Classification of naive Bayes imbalanced data based on cost sensitive[J].Journal of Computer Research and Development,2011,48(S1):387-390. [25]PATEL H,THAKUR G S.A hybrid weighted nearest neighbor approach to mine imbalanced data[C]//International Conference on Data Mining.Las Vegas:IEEE,2016:106-112. |
[1] | 陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙. 数据流概念漂移处理方法研究综述 Survey of Concept Drift Handling Methods in Data Streams 计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112 |
[2] | 周旭, 钱胜胜, 李章明, 方全, 徐常胜. 基于对偶变分多模态注意力网络的不完备社会事件分类方法 Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification 计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022 |
[3] | 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军. 基于多智能体强化学习的端到端合作的自适应奖励方法 Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning 计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100 |
[4] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[5] | 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航. 监督和半监督学习下的多标签分类综述 Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning 计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111 |
[6] | 刘高聪, 罗永平, 金培权. 基于热点数据的持久性内存索引查询加速 Accelerating Persistent Memory-based Indices Based on Hotspot Data 计算机科学, 2022, 49(8): 26-32. https://doi.org/10.11896/jsjkx.210700176 |
[7] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[8] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[9] | 陈俊, 何庆, 李守玉. 基于自适应反馈调节因子的阿基米德优化算法 Archimedes Optimization Algorithm Based on Adaptive Feedback Adjustment Factor 计算机科学, 2022, 49(8): 237-246. https://doi.org/10.11896/jsjkx.210700150 |
[10] | 王杰, 李晓楠, 李冠宇. 基于自适应注意力机制的知识图谱补全算法 Adaptive Attention-based Knowledge Graph Completion 计算机科学, 2022, 49(7): 204-211. https://doi.org/10.11896/jsjkx.210400129 |
[11] | 唐枫, 冯翔, 虞慧群. 基于自适应知识迁移与资源分配的多任务协同优化算法 Multi-task Cooperative Optimization Algorithm Based on Adaptive Knowledge Transfer andResource Allocation 计算机科学, 2022, 49(7): 254-262. https://doi.org/10.11896/jsjkx.210600184 |
[12] | 高振卓, 王志海, 刘海洋. 嵌入典型时间序列特征的随机Shapelet森林算法 Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features 计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226 |
[13] | 杨炳新, 郭艳蓉, 郝世杰, 洪日昌. 基于数据增广和模型集成策略的图神经网络在抑郁症识别上的应用 Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition 计算机科学, 2022, 49(7): 57-63. https://doi.org/10.11896/jsjkx.210800070 |
[14] | 张洪博, 董力嘉, 潘玉彪, 萧宗志, 张惠臻, 杜吉祥. 视频理解中的动作质量评估方法综述 Survey on Action Quality Assessment Methods in Video Understanding 计算机科学, 2022, 49(7): 79-88. https://doi.org/10.11896/jsjkx.210600028 |
[15] | 杜丽君, 唐玺璐, 周娇, 陈玉兰, 程建. 基于注意力机制和多任务学习的阿尔茨海默症分类 Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning 计算机科学, 2022, 49(6A): 60-65. https://doi.org/10.11896/jsjkx.201200072 |
|