计算机科学 ›› 2019, Vol. 46 ›› Issue (7): 157-164.doi: 10.11896/j.issn.1002-137X.2019.07.025

• 人工智能 • 上一篇    下一篇

基于粗糙集和果蝇优化算法的特征选择方法

方波,陈红梅,王生武   

  1. (西南交通大学信息科学与技术学院 成都611756)
    (西南交通大学云计算与智能技术高校重点实验室 成都611756)
  • 收稿日期:2018-06-21 出版日期:2019-07-15 发布日期:2019-07-15
  • 作者简介:方 波(1991-),男,硕士生,主要研究方向为数据挖掘、机器学习等,E-mail:fangbo19910204@163. com;陈红梅(1971-),女,博士,教授,主要研究方向为粒计算与粗糙集、智能信息处理等, E-mail:hmchen@swjtu.edu.cn(通信作者);王生武(1995-),男,硕士生,主要研究方向为云计算与智能技术。
  • 基金资助:
    国家自然科学基金(61572406)资助

Feature Selection Algorithm Based on Rough Sets and Fruit Fly Optimization

FANG Bo,CHEN Hong-mei,WANG Sheng-wu   

  1. (School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China)
    (Key Laboratory of Cloud Computing and Intelligent Technology,Southwest Jiaotong University,Chengdu 611756,China)
  • Received:2018-06-21 Online:2019-07-15 Published:2019-07-15

摘要: 特征选择是模式识别领域重要的数据预处理步骤之一,旨在从原始特征集合中选出最有效的特征子集使得给定评价准则达到最优。为此,文中提出了一种基于粗糙集和果蝇优化算法的特征选择方法。该方法基于一种新的双策略进化果蝇优化算法进行特征子集的迭代寻优,并结合粗糙集属性依赖度和属性重要性构造适应度函数对所选特征子集进行评估,既可以在全局范围内尽可能多地搜索出重要的特征,又能选出对决策最具有贡献的有效特征子集。在UCI数据集上的实验结果表明,提出的特征选择方法可以有效地搜索出具有最少信息损失的特征子集,并达到较高的分类精度。

关键词: 粗糙集, 果蝇优化算法, 双策略进化, 属性依赖度, 属性重要性

Abstract: Feature selection is one of the most important data preprocessing steps in the field of pattern recognition,aiming at searching the most effective subset with the best value of evaluation function from original data set.This paper proposed a new feature selection strategy based on the rough set theory and fruit fly optimization algorithm.The novel double strategies evolutionary fruit fly optimization algorithm(DSEFOA) is used to search feature subset and execute the iterative optimization.Specially,the selected feature subset is evaluated by the fitness function constructed by attribute dependency and attribute importance simulataneously,which aims at searching important features as many as possible in feature space and further selecting effective feature subset with the most contribution to the decision.Experimental results on UCI datasets show that the proposedfeature selection method can effectively search the feature subset with the minimum information loss and achieve high classification accuracy.

Key words: Attribute dependency, Attribute importance, Double strategies evolutionary, Fruit fly optimization algorithm, Rough sets

中图分类号: 

  • TP301.6
[1]LI J D,LIU H.Challenges of feature selection for big data analytics[J].IEEE Intelligent Systems,2017,32(2):9-15.
[2]MIAO J Y,NIU L F.A survey on feature selection[J].Procedia Computer Science,2016,91:919-926.
[3]CHANDRASHEKAR G,SAHIN F.A survey on feature selection methods[J].Computers and Electrical Engineering,2014,40(1):16-28.
[4]LI M,KAMILI M.Research on feature selection methods and algorithms[J].Computer Technology and Development,2013(12):16-21.(in Chinese)
李敏,卡米力·木依丁.特征选择方法与算法的研究[J].计算机技术与发展,2013(12):16-21.
[5]SANTANA L E A D S,CANUTO A M D P.Filter-based optimization techniques for selection of feature subsets in ensemble systems[J].Expert Systems with Applications,2014,41(4):1622-1631.
[6]YANG P Y,WEI L,ZHOU B B,et al.Ensemble-based wrapper methods for feature selection and class imbalance learning[C]∥Pacific-Asia Conference on Knowledge Discovery and Data Mi-ning.Berlin,Heidelberg:Springer,2013:544-555.
[7]YOU M Y,LIU J M,LI G Z,et al.Embedded feature selection for multi-label classification of music emotions[J].International Journal of Computational Intelligence Systems,2012,5(4):668-678.
[8]PAWLAK Z,GRZYMALA-BUSSE J,SLOWINSKI R,et al. Rough sets[J].International Journal of Computer and Information Science,1982,11(5):341-356.
[9]ZHOU T,LU H L,ZHANG Y N,et al.A new hybrid genetic algorithm for high dimension feature selection based on rough set[J].Journal of Nanjing University(Natural Sciences),2015,51(4):880-893.(in Chinese)
周涛,陆惠玲,张艳宁,等.基于Rough Set的高维特征选择混合遗传算法研究[J].南京大学学报(自然科学),2015,51(4):880-893.
[10]PAN W T.A new fruit fly optimization algorithm:taking the financial distress model as an example[J].Knowledge-Based Systems,2012,26(2):69-74.
[11]GAO H C,FENG B Q,ZHU L.Reviews of the meta-heuristic algorithms for TSP[J].Control and Decision,2006,21(3):241-247.(in Chinese)
高海昌,冯博琴,朱利.智能优化算法求解TSP问题[J].控制与决策,2006,21(3):241-247.
[12]WANG X Y,YANG J,TENG X L,et al.Feature selection based on rough sets and particle swarm optimization[J].Pattern Re-cognition Letters,2007,28(4):459-471.
[13]WANG L,QIU T R,HE N,et al.A method for feature selection based on rough sets and ant colonyoptimization algorithm[J].Journal of Nanjing University(Natural Sciences),2010,46(5):487-493.(in Chinese)
王璐,邱桃荣,何妞,等.基于粗糙集和蚁群优化算法的特征选择方法[J].南京大学学报(自然科学),2010,46(5):487-493.
[14]CHEN Y M,MIAO D Q,WANG R Z.A rough set approach to feature selection based on ant colony optimization[J].Pattern Recognition Letters,2010,31(3):226-233.
[15]CHEN Y M,ZHU Q X,XU H R.Finding rough set reducts with fish swarm algorithm[J].Knowledge-Based Systems,2015,81(C):22-29.
[16]BAE C,YEH W C,CHUNG Y Y,et al.Feature selection with intelligent dynamic swarm and rough set[J].Expert Systems with Applications,2010,37(10):7026-7032.
[17]YUAN M,WANG M,PAN Y X,et al.A feature selection method for the milling force signal based on the improved fruit fly optimization algorithm[J].Journal of Vibration and Shock,2016,35(24):196-200.(in Chinese)
袁敏,王玫,潘玉霞,等.基于改进果蝇优化算法的铣削力信号特征选择方法[J].振动与冲击,2016,35(24):196-200.
[18]YIN L J,LI X Y,GAO L,et al.A new improved fruit fly optimization algorithm for traveling salesman problem[C]//The 8th International Conference on Advanced Computational Intelligence.Chiang Mai,Thailand:IEEE,2016:21-28.
[19]MENG T,PAN Q K.An improved fruit fly optimization algorithm for solving the multidimensional knapsack problem[J].Applied Soft Computing,2017(50):79-93.
[20]ZHANG Y W,CUI G M,WANG Y,et al.An optimization algorithm for service composition based on an improved FOA[J].Tsinghua Science and Technology,2015,20(1):90-99.
[21]ZHANG W X,WU W Z,LIANG J Y,et al.Rough set theory and method[M].Beijing:Science Press,2001:15-90.(in Chinese)
张文修,吴伟志,梁吉业,等.粗糙集理论与方法[M].北京:科学出版社,2001:15-90.
[22]PAWLAK Z.Rough sets:theoretical aspects of reasoning about data:Vol 9[M].Kluwer Academic Publishers,1992.
[23]GUAN Y Y,WANG H K.Set-valued information systems[J].Information Sciences,2006,176(17):2507-2525.
[24]WANG G Y,YU H,YANG D C.Decision table reduction based on conditional information entropy[J].Chinese Journal of Computers,2002,25(7):759-766.(in Chinese)
王国胤,于洪,杨大春.基于条件信息熵的决策表约简[J].计算机学报,2002,25(7):759-766.
[25]LIANG J Y,WANG F,DANG C Y,et al.A group incremental approach to feature selection applying rough set technique[J].IEEE Transactions on Knowledge and Data Engineering,2014,26(2):294-308.
[26]WAIKATO M L G.Weka 3:Data Mining Software in Java[EB/OL].http://www.cs.waikato.ac.nz/ml/weka/.
[27]MITIC M,VUKOVIC N,PETROVIC M,et al.Chaotic fruit fly optimization algorithm[J].Knowledge-Based Systems,2015,89(C):446-458.
[1] 程富豪, 徐泰华, 陈建军, 宋晶晶, 杨习贝.
基于顶点粒k步搜索和粗糙集的强连通分量挖掘算法
Strongly Connected Components Mining Algorithm Based on k-step Search of Vertex Granule and Rough Set Theory
计算机科学, 2022, 49(8): 97-107. https://doi.org/10.11896/jsjkx.210700202
[2] 许思雨, 秦克云.
基于剩余格的模糊粗糙集的拓扑性质
Topological Properties of Fuzzy Rough Sets Based on Residuated Lattices
计算机科学, 2022, 49(6A): 140-143. https://doi.org/10.11896/jsjkx.210200123
[3] 方连花, 林玉梅, 吴伟志.
随机多尺度序决策系统的最优尺度选择
Optimal Scale Selection in Random Multi-scale Ordered Decision Systems
计算机科学, 2022, 49(6): 172-179. https://doi.org/10.11896/jsjkx.220200067
[4] 陈于思, 艾志华, 张清华.
基于三角不等式判定和局部策略的高效邻域覆盖模型
Efficient Neighborhood Covering Model Based on Triangle Inequality Checkand Local Strategy
计算机科学, 2022, 49(5): 152-158. https://doi.org/10.11896/jsjkx.210300302
[5] 孙林, 黄苗苗, 徐久成.
基于邻域粗糙集和Relief的弱标记特征选择方法
Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief
计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094
[6] 王子茵, 李磊军, 米据生, 李美争, 解滨.
基于误分代价的变精度模糊粗糙集属性约简
Attribute Reduction of Variable Precision Fuzzy Rough Set Based on Misclassification Cost
计算机科学, 2022, 49(4): 161-167. https://doi.org/10.11896/jsjkx.210500211
[7] 王志成, 高灿, 邢金明.
一种基于正域的三支近似约简
Three-way Approximate Reduction Based on Positive Region
计算机科学, 2022, 49(4): 168-173. https://doi.org/10.11896/jsjkx.210500067
[8] 薛占熬, 侯昊东, 孙冰心, 姚守倩.
带标记的不完备双论域模糊概率粗糙集中近似集动态更新方法
Label-based Approach for Dynamic Updating Approximations in Incomplete Fuzzy Probabilistic Rough Sets over Two Universes
计算机科学, 2022, 49(3): 255-262. https://doi.org/10.11896/jsjkx.201200042
[9] 李艳, 范斌, 郭劼, 林梓源, 赵曌.
基于k-原型聚类和粗糙集的属性约简方法
Attribute Reduction Method Based on k-prototypes Clustering and Rough Sets
计算机科学, 2021, 48(6A): 342-348. https://doi.org/10.11896/jsjkx.201000053
[10] 薛占熬, 孙冰心, 侯昊东, 荆萌萌.
基于多粒度粗糙直觉犹豫模糊集的最优粒度选择方法
Optimal Granulation Selection Method Based on Multi-granulation Rough Intuitionistic Hesitant Fuzzy Sets
计算机科学, 2021, 48(10): 98-106. https://doi.org/10.11896/jsjkx.200800074
[11] 薛占熬, 张敏, 赵丽平, 李永祥.
集对优势关系下多粒度决策粗糙集的可变三支决策模型
Variable Three-way Decision Model of Multi-granulation Decision Rough Sets Under Set-pair Dominance Relation
计算机科学, 2021, 48(1): 157-166. https://doi.org/10.11896/jsjkx.191200175
[12] 桑彬彬, 杨留中, 陈红梅, 王生武.
优势关系粗糙集增量属性约简算法
Incremental Attribute Reduction Algorithm in Dominance-based Rough Set
计算机科学, 2020, 47(8): 137-143. https://doi.org/10.11896/jsjkx.190700188
[13] 陈玉金, 徐吉辉, 史佳辉, 刘宇.
基于直觉犹豫模糊集的三支决策模型及其应用
Three-way Decision Models Based on Intuitionistic Hesitant Fuzzy Sets and Its Applications
计算机科学, 2020, 47(8): 144-150. https://doi.org/10.11896/jsjkx.190800041
[14] 周俊丽, 管延勇, 徐法升, 王洪凯.
覆盖近似空间中的核及其性质
Core in Covering Approximation Space and Its Properties
计算机科学, 2020, 47(6A): 526-529. https://doi.org/10.11896/JsJkx.190600003
[15] 张琴, 陈红梅, 封云飞.
一种基于粗糙集和密度峰值的重叠社区发现方法
Overlapping Community Detection Method Based on Rough Sets and Density Peaks
计算机科学, 2020, 47(5): 72-78. https://doi.org/10.11896/jsjkx.190400160
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!