计算机科学 ›› 2020, Vol. 47 ›› Issue (5): 103-109.doi: 10.11896/jsjkx.180601099

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于多类邻域三支决策模型的不平衡数据分类

向伟1, 王新维2   

  1. 1 四川文理学院智能制造学院 四川 达州635000
    2 四川大学计算机学院 成都610065
  • 收稿日期:2018-06-19 出版日期:2020-05-15 发布日期:2020-05-19
  • 通讯作者: 向伟(xiangwei19766@163.com)
  • 基金资助:
    四川省教育厅重点项目(16ZB0360)

Imbalance Data Classification Based on Model of Multi-class Neighbourhood Three-way Decision

XIANG Wei1, WANG Xin-wei2   

  1. 1 School of Intelligent Manufacturing,Sichuan University of Arts and Science,Dazhou,Sichuan 635000,China
    2 College of Computer Science,Sichuan University,Chengdu 610065,China
  • Received:2018-06-19 Online:2020-05-15 Published:2020-05-19
  • About author:XIANG Wei,born in 1976,associate professor.His main research interests include computer-based information proces-sing & intelligent algorithm.
  • Supported by:
    This work was supported by the Major Project of Sichuan Education Department (16ZB0360).

摘要: 不平衡数据分类是一种重要的数据分类问题。对于不平衡数据中规模较小的类,传统的分类算法的分类效果较差。对此,提出一种多类邻域三支决策模型的不平衡数据分类算法。首先,将传统的三支决策在混合数据和多个类的情形下进行推广,提出了混合数据的多类邻域三支决策模型;然后,在该模型中给出一种自适应代价函数的设定方法,并基于该方法提出了多类邻域三支决策模型的不平衡数据分类算法。仿真实验的结果表明,所提出的分类算法对于不平衡数据具有更好的分类性能。

关键词: 不平衡数据, 代价函数, 分类, 三支决策, 自适应

Abstract: Imbalance data classification is an important data classification problem,traditional classification algorithm does not have better classification effect for smaller class in imbalance data.Therefore,this paper proposed an algorithm of imbalance data classification based on multi-class neighbourhood three-way decision.In the case of mixed data and multiple classes,traditional three-way decision is firstly generalized,and the multi-class neighbourhood three-way decision model of mixed data is presented.Then,a setting method of self-adaption cost function is given in the model,and based on this method,the algorithm of imbalance data classification of multi-class neighbourhood three-way decision model is proposed.Simulation experiment results show that the proposed classification algorithm has better classification performance for imbalance data.

Key words: Classification, Cost function, Imbalance data, Self-adaption, Three-way decision

中图分类号: 

  • TP18
[1]ZHANG S,SADAOUI S,MOUHOUB M.An empirical analysis of imbalanced data classification[J].Computer & Information Science,2015,8(1):151-162.
[2]HE H B,GARCIA E.Learning from imbalanced data[J].IEEE Transactions on Knowledge & Data Engineering,2009,21(9):1263-1284.
[3]HE H L,ZHANG W Y,ZHANG S.A novel ensemble method for credit scoring:Adaption of different imbalance ratios[J].Expert Systems with Applications,2018,98(15):105-117.
[4]RIVERA W A.Noise reduction a priori synthetic over-sampling for class imbalanced data sets[J].Information Sciences,2017,408:146-161.
[5]DOUZAS G,BACAO F,LAST F.Improving imbalanced lear-ning through a heuristic oversampling method based on k-means and SMOTE[J].Information Sciences,2018,465:1-20.
[6]CORDÓN I,GARCÍA S,FERNÁNDEZ A,et al.Imbalance:Oversampling algorithms for imbalanced classification in R[J].Knowledge-Based Systems,2018,161:329-341.
[7]ZHU Y J,WANG Z,GAO D Q.Gravitational fixed radius nearest neighbor for imbalanced problem[J].Knowledge-Based Systems,2015,90:224-238.
[8]WU G,CHANG E.KBA:Kernel boundary alignment conside-ring imbalanced data distribution[J].IEEE Transactions on Knowledge & Data Engineering,2005,17(6):786-795.
[9]GUPTA D,RICHHARIYA B,BORAH P.A fuzzy twin support vector machine based on information entropy for class imbalance learning[J].Neural Computing & Applications,2018(3):1-12.
[10]DÍEZ-PASTOR J F,RODRÍGUEZ J J,GARCÍA-OSORIO C,et al.Random Balance:Ensembles of variable priors classifiers for imbalanced data[J].Knowledge-Based Systems,2015,85(2/3):96-111.
[11]KHAN S H,HAYAT M,BENNAMOUN M,et al.Cost-sensitive learning of deep feature representations from imbalanced data[J].IEEE Transactions on Neural Networks & Learning Systems,2018,29(8):3573-3587.
[12]CAO C J,WANG Z.IMCStacking:Cost-sensitive stacking lear-ning with feature inverse mapping for imbalanced problems[J].Knowledge-Based Systems,2018,150(15):27-37.
[13]YAO Y Y.Three-way decisions with probabilistic rough sets[J].Information Sciences,2010,180(3):341-353.
[14]ZHOU B.Multi-class decision-theoretic rough sets[J].International Journal of Approximate Reasoning,2014,55(1):211-224.
[15]LIANG D C,LIU D,KOBINA A.Three-way group decisions with decision-theoretic rough sets[J].Information Sciences,2016,345:46-64.
[16]CHEN Y F,YUE X D,FUJITA H,et al.Three-way decision support for diagnosis on focal liver lesions[J].Knowledge-Based Systems,2017,127:85-99.
[17]LIU D,LI T R,LI H X.A multiple-category classification approach with decision-theoretic rough sets[J].Fundamenta Informaticae,2012,115(2/3):173-188.
[18]LI W W,HUANG Z Q,JIA X Y,et al.Neighborhood based decision-theoretic rough set models[J].International Journal of Approximate Reasoning,2016,69:1-17.
[19]HU Q H,YU D R,LIU J F,et al.Neighborhood rough set based heterogeneous feature subset selection[J].Information Sciences,2008,178(18):3577-3594.
[20]HU Q H,YU D R,XIE Z X.Neighborhood classifiers[J].Expert Systems with Applications,2008,34(2):866-876.
[21]KUBAT M,HOLTE R,MATWIN S.Learning when negative examples abound[C]//European Conference on Machine Lear-ning.Springer Berlin Heidelberg,1997:146-153.
[22]DAVIS J,GOADRICH M.The relationship between Precision-Recall and ROC curves[C]//Proceedings of the,International Conference on Machine Learning(ICML 2006).New York,USA:ACM Press,2006:233-240.
[23]FAWCETT T.An introduction to ROC analysis[J].PatternRecognition Letters,2006,27(8):861-874.
[24]JIANG S Y,XIE Z Q,YU W.Classification of naive Bayes imbalanced data based on cost sensitive[J].Journal of Computer Research and Development,2011,48(S1):387-390.
[25]PATEL H,THAKUR G S.A hybrid weighted nearest neighbor approach to mine imbalanced data[C]//International Conference on Data Mining.Las Vegas:IEEE,2016:106-112.
[1] 陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙.
数据流概念漂移处理方法研究综述
Survey of Concept Drift Handling Methods in Data Streams
计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112
[2] 周旭, 钱胜胜, 李章明, 方全, 徐常胜.
基于对偶变分多模态注意力网络的不完备社会事件分类方法
Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification
计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022
[3] 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军.
基于多智能体强化学习的端到端合作的自适应奖励方法
Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning
计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100
[4] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[5] 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航.
监督和半监督学习下的多标签分类综述
Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning
计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111
[6] 刘高聪, 罗永平, 金培权.
基于热点数据的持久性内存索引查询加速
Accelerating Persistent Memory-based Indices Based on Hotspot Data
计算机科学, 2022, 49(8): 26-32. https://doi.org/10.11896/jsjkx.210700176
[7] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[8] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[9] 陈俊, 何庆, 李守玉.
基于自适应反馈调节因子的阿基米德优化算法
Archimedes Optimization Algorithm Based on Adaptive Feedback Adjustment Factor
计算机科学, 2022, 49(8): 237-246. https://doi.org/10.11896/jsjkx.210700150
[10] 王杰, 李晓楠, 李冠宇.
基于自适应注意力机制的知识图谱补全算法
Adaptive Attention-based Knowledge Graph Completion
计算机科学, 2022, 49(7): 204-211. https://doi.org/10.11896/jsjkx.210400129
[11] 唐枫, 冯翔, 虞慧群.
基于自适应知识迁移与资源分配的多任务协同优化算法
Multi-task Cooperative Optimization Algorithm Based on Adaptive Knowledge Transfer andResource Allocation
计算机科学, 2022, 49(7): 254-262. https://doi.org/10.11896/jsjkx.210600184
[12] 高振卓, 王志海, 刘海洋.
嵌入典型时间序列特征的随机Shapelet森林算法
Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features
计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226
[13] 杨炳新, 郭艳蓉, 郝世杰, 洪日昌.
基于数据增广和模型集成策略的图神经网络在抑郁症识别上的应用
Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition
计算机科学, 2022, 49(7): 57-63. https://doi.org/10.11896/jsjkx.210800070
[14] 张洪博, 董力嘉, 潘玉彪, 萧宗志, 张惠臻, 杜吉祥.
视频理解中的动作质量评估方法综述
Survey on Action Quality Assessment Methods in Video Understanding
计算机科学, 2022, 49(7): 79-88. https://doi.org/10.11896/jsjkx.210600028
[15] 杜丽君, 唐玺璐, 周娇, 陈玉兰, 程建.
基于注意力机制和多任务学习的阿尔茨海默症分类
Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning
计算机科学, 2022, 49(6A): 60-65. https://doi.org/10.11896/jsjkx.201200072
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!