计算机科学 ›› 2020, Vol. 47 ›› Issue (1): 87-95.doi: 10.11896/jsjkx.181202320
樊鑫1,陈红梅2
FAN Xin1,CHEN Hong-mei2
摘要: 分类问题普遍存在于现代工业生产中。在进行分类任务之前,利用特征选择筛选有用的信息,能够有效地提高分类效率和分类精度。最小冗余最大相关算法(mRMR)考虑最大化特征与类别的相关性和最小化特征之间的冗余性,能够有效地选择特征子集;但该算法存在中后期特征重要度偏差大以及无法直接给出特征子集的问题。针对该问题,文中提出了结合邻域粗糙集差别矩阵和mRMR原理的特征选择算法。根据最大相关性和最小冗余性原则,利用邻域熵和邻域互信息定义了特征的重要度,以更好地处理混合数据类型。基于差别矩阵定义了动态差别集,利用差别集的动态演化有效去除冗余属性,缩小搜索范围,优化特征子集,并根据差别矩阵判定迭代截止条件。实验选取SVM,J48,KNN和MLP作为分类器来评价该特征选择算法的性能。在公共数据集上的实验结果表明,与已有算法相比,所提算法的平均分类精度提升了2%左右,同时在特征较多的数据集上能够有效地缩短特征选择时间。所提算法继承了差别矩阵和mRMR的优点,能够有效地处理特征选择问题。
中图分类号:
[1]CHANDRASHEKAR G,SAHIN F.A survey on feature selection methods[J].Computers and Electrical Engineering,2014,40(1):16-28. [2]XE J Y,WANG M Z,ZHOU Y,et al.Differentially expressed gene selection algorithms for unbalanced gene datasets [J].Chinese Journal of Computers,2019,42(6):1232-1250. [3]WANG D,NIE F,HUANG H.Feature Selection via Global Redundancy Minimization[J].IEEE Transactions on Knowledge and Data Engineering,2015,27(10):2743-2755. [4]TASKIN G,KAYA H,BRUZZONE L.Feature selection based on high dimensional model representation for hyperspectral images[J].IEEE Transactions on Image Processing,2017,26(6):2918-2928. [5]SHEIKHPOUR R,SARRAM M A,GHARAGHANI S,et al.A Survey on semi-supervised feature selection methods[J].Pattern Recognition,2016,64(C):141-158. [6]MAFARJA M,MIRJALILI S.Whale optimization approaches for wrapper feature selection[J].Applied Soft Computing Journal,2018,62:441-453. [7]VIOLA M,SANGIOVANNI M,TORALDO G,et al.A genera- lized eigenvalues classifier with embedded feature selection[J].Optimization Letters,2017,11(2):299-311. [8]PENG H,LONG F,DING C.Feature selection based on mutual information:Criteria of Max-Dependency,Max-Relevance,and Min-Redundancy[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(8):1226-1238. [9]ESTEVEZ P A,TESMER M,PEREZ C A,et al.Normalized mutual information feature selection[J].IEEE Transactions on Neural Networks,2009,20(2):189-201. [10]SUN Z,ZHANG J,DAI L,et al.Neurocomputing Mutual information based multi-label feature selection via constrained convex optimization[J].Neurocomputing,2019,329:447-456. [11]WANG J,WEI J M,YANG Z,et al.Feature Selection by Maximizing Independent Classification Information[J].IEEE Transactions on Knowledge & Data Engineering,2017,29(4):828-841. [12]PAWLAK Z.Rough sets[J].International Journal of Computer & Information Sciences,1982,11(5):341-356. [13]LIU Y,CHEN Y,TAN K,et al.Maximum relevance,minimum redundancy band selection based on neighborhood rough set for hyperspectral data classification[J].Measurement Science and Technology,2016,27(12):125501. [14]YONG L,WENLIANG H,YUNLIANG J,et al.Quick attribute reduct algorithm for neighborhood rough set model[J].Information Sciences,2014,271:65-81. [15]JIANG Y,YU Y.Minimal attribute reduction with rough set based on compactness discernibility information tree[J].Soft Computing,Springer Berlin Heidelberg,2016,20(6):2233-2243. [16]DUBOIS D,PRADE H.Rough fuzzy sets and fuzzy rough sets[J].International Journal of General Systems,1990,17(2/3):191-209. [17]Hu Q H,YU D R,XIE Z X.Numerical Attribute Reduction Based on Neighborhood Granulation and Rough Approximattion [J].Journal of Software,2008,19(3):640-649. [18]SKOWRON A,RAUSZER C.The Discernibility Matrices and Functions in Information Systems[M]∥Intelligent Decision Support.Dordrecht:Springer,1992:331-362. [19]DAI J.Rough set approach to incomplete numerical data[J].Information Sciences,2013,241:43-57. [20]LI M,SHANG C,FENG S,et al.Quick attribute reduction in inconsistent decision tables[J].Information Sciences,2014,254:155-180. [21]YANG M,YANG P.A novel condensing tree structure for rough set feature selection[J].Neurocomputing,2008,71(4/5/6):1092-1100. [22]HU Q,ZHANG L,ZHANG D,et al.Measuring relevance between discrete and continuous features based on neighborhood mutual information[J].Expert Systems with Applications,2011,38(9):10737-10750. [23]LIN Y,HU Q,LIU J,et al.Multi-label feature selection based on neighborhood mutual information[J].Applied Soft Computing,2016,38:244-256. [24]HU Q,ZHANG L,ZHANG D,et al.Measuring relevance be- tween discrete and continuous features based on neighborhood mutual information[J].Expert Systems with Applications,2011,38(9):10737-10750. [25]BERMEJO P,OSSA L DE,GAMEZ J A.Knowledge-Based Systems Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking[J].Knowledge-Based Systems,2012,25(1):35-44. [26]HU Q,YU D,XIE Z.Neighborhood classifiers[J].Expert Systems with Applications,2008,34(2):866-876. |
[1] | 李斌, 万源. 基于相似度矩阵学习和矩阵校正的无监督多视角特征选择 Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment 计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124 |
[2] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[3] | 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩. 混合改进的花授粉算法与灰狼算法用于特征选择 Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection 计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135 |
[4] | 陈于思, 艾志华, 张清华. 基于三角不等式判定和局部策略的高效邻域覆盖模型 Efficient Neighborhood Covering Model Based on Triangle Inequality Checkand Local Strategy 计算机科学, 2022, 49(5): 152-158. https://doi.org/10.11896/jsjkx.210300302 |
[5] | 储安琪, 丁志军. 基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理 Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation 计算机科学, 2022, 49(4): 134-139. https://doi.org/10.11896/jsjkx.210300075 |
[6] | 孙林, 黄苗苗, 徐久成. 基于邻域粗糙集和Relief的弱标记特征选择方法 Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief 计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094 |
[7] | 李宗然, 陈秀宏, 陆赟, 邵政毅. 鲁棒联合稀疏不相关回归 Robust Joint Sparse Uncorrelated Regression 计算机科学, 2022, 49(2): 191-197. https://doi.org/10.11896/jsjkx.210300034 |
[8] | 张叶, 李志华, 王长杰. 基于核密度估计的轻量级物联网异常流量检测方法 Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method 计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108 |
[9] | 杨蕾, 降爱莲, 强彦. 基于自编码器和流形正则的结构保持无监督特征选择 Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization 计算机科学, 2021, 48(8): 53-59. https://doi.org/10.11896/jsjkx.200700211 |
[10] | 侯春萍, 赵春月, 王致芃. 基于自反馈最优子类挖掘的视频异常检测算法 Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining 计算机科学, 2021, 48(7): 199-205. https://doi.org/10.11896/jsjkx.200800146 |
[11] | 胡艳梅, 杨波, 多滨. 基于网络结构的正则化逻辑回归 Logistic Regression with Regularization Based on Network Structure 计算机科学, 2021, 48(7): 281-291. https://doi.org/10.11896/jsjkx.201100106 |
[12] | 周钢, 郭福亮. 基于特征选择的高维数据集成学习方法研究 Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data 计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102 |
[13] | 丁思凡, 王锋, 魏巍. 一种基于标签相关度的Relief特征选择算法 Relief Feature Selection Algorithm Based on Label Correlation 计算机科学, 2021, 48(4): 91-96. https://doi.org/10.11896/jsjkx.200800025 |
[14] | 滕俊元, 高猛, 郑小萌, 江云松. 噪声可容忍的软件缺陷预测特征选择方法 Noise Tolerable Feature Selection Method for Software Defect Prediction 计算机科学, 2021, 48(12): 131-139. https://doi.org/10.11896/jsjkx.201000168 |
[15] | 张亚钏, 李浩, 宋晨明, 卜荣景, 王海宁, 康雁. 混合人工化学反应优化和狼群算法的特征选择 Hybrid Artificial Chemical Reaction Optimization with Wolf Colony Algorithm for Feature Selection 计算机科学, 2021, 48(11A): 93-101. https://doi.org/10.11896/jsjkx.210100067 |
|