计算机科学 ›› 2022, Vol. 49 ›› Issue (11): 98-108.doi: 10.11896/jsjkx.210900076
闫振超, 舒文豪, 谢昕
YAN Zhen-chao, SHU Wen-hao, XIE Xin
摘要: 许多实际应用中的数据集是由符号型、数值型和缺失型特征构成的混合数据。针对混合数据的决策标记,由于获取全部数据的决策标记需要耗费大量的人工和时间成本,只能为部分数据进行决策标记,因此产生了部分标记数据。同时,现实应用领域中数据是动态产生的,即数据维度随着不同的需求动态地增加或删减。针对混合数据的高维性、部分标记和动态性,文中提出了两种面向部分标记混合数据的增量式特征选择算法。首先,利用信息粒度对部分标记混合数据的特征进行重要度分析;其次,当特征集发生动态变化时,结合增量学习的思想,给出信息粒度的增量更新机制;然后,在此基础上提出了两种面向部分标记混合数据的增量式特征选择算法;最后,通过与其他算法在UCI数据集上的实验结果进行对比,进一步验证了所提算法的可行性和有效性。
中图分类号:
[1]WANG C Z,HUANG Y,SHAO M W,et al.Feature selection based on neighborhood self-information[J].IEEE Transactions on Cybernetics,2019,99(7):1-12. [2]WANG Q,QIAN Y H,LIANG X Y,et al.Local neighborhood rough set[J].Knowledge-Based Systems,2018,153(8):53-64. [3]WANG D,CHEN H M,LI T R,et al.A novel quantum grasshopper optimization algorithm for feature selection[J].International Journal of Approximate Reasoning,2020,127(12):122-150. [4]PAWLAK Z.Rough sets[J].International Journal of Computer and Information Sciences,1982,11(5):341-356. [5]ZHENG N,WANG J Y.Evidence characteristics and attribute reduction of incomplete ordered information system[J].Computer Engineering and Applications,2018,54(21):43-47. [6]JIANG Z H,LIU K Y,YANG X B,et al.Accelerator for supervised neighborhood based attribute reduction[J].International Journal of Approximate Reasoning,2020,119(4):122-150. [7]WAN Y,CHEN X L,ZHANG J H,et al.Semi-supervised feature selection based on low-rank sparse graph embedding[J].Journal of Image and Graphics,2018,23(9):1316-1325. [8]LIU K Y,YANG X B,YU H L,et al.Supervised information granulation strategy for attribute reduction[J].International Journal of Machine Learning and Cybernetics,2020,11(3):2149-2163. [9]HU Q H,XIE Z X,YU D R.Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation[J].Pattern Recognition,2007,40(12):3509-3521. [10]JING Y G,LI T R,FUJITA H,et al.An incremental attribute reduction method for dynamic data mining[J].Information Sciences,2018,465(7):202-218. [11]WEI W,LIANG J Y,QIAN Y H.A comparative study of rough sets for hybrid data[J].Information Sciences,2012,190(6):1-16. [12]WANG F,LIU J C,WEI W.Semi-supervised feature selectionalgorithm based on information entropy[J].Computer Science,2018,45(11):427-430. [13]DAI J H,HU Q H,ZHANG J H,et al.Attribute selection for partially labeled categorical data by rough set approach[J].IEEE Transactions on Cybernetics,2017,47(9):2460-2471. [14]LIU K Y,YANG X B,YU H L,et al.Rough set based semi-supervised feature selection via ensemble selector[J].Knowledge-Based Systems,2019,165(1):282-296. [15]XIAO L S,WANG H J,YANG Y.Semi-supervised feature selection based on attribute dependency and hybrid constraint[J].Journal of Computer Applications,2015,35(12):80-84. [16]MA F M,DING M W,ZHANG T F,et al.Compressed binary discernibility matrix based incremental attribute reduction algorithm for group dynamic data[J].Neurocomputing,2019,334(6):20-27. [17]SHU W H,QIAN W B,XIE Y H.Incremental approaches for feature selection from dynamic data with the variation of multiple objects[J].Knowledge-Based System,2019,163(1):320-331. [18]HUANG Q Q,LI T R,HUANG Y Y,et al.Incremental three-way neighborhood approach for dynamic incomplete hybrid data[J].Information Sciences,2020,541(12):98-122. [19]LIU Y,ZHENG L D,XIU Y L,et al.Discernibility matrix based incremental feature selection on fused decision tables[J].International Journal of Approximate Reasoning,2020,118(3):1-26. [20]ZENG A P,LI T R,LIU D,et al.A fuzzy rough set approach for incremental feature selection on hybrid information systems[J].Fuzzy Sets and Systems,2015,258(6):39-60. [21]YU J H,CHEN M H,XU W H.Dynamic computing rough approximationsapproach to time-evolving information granule interval-valued ordered information system[J].Applied Soft Computing,2017,60(6):18-29. [22]CAI M J,LANG G M,FUJITA H,et al.Incremental approaches to updating reducts under dynamic covering granularity[J].Knowledge-Based Systems,2019,172(1):130-140. [23]WANG S,LI T R,LUO C,et al.A novel approach for efficient updating approximations in dynamic ordered information systems[J].Information Sciences,2020,507(8):197-219. [24]HUANG Y Y,LI T R,LUO C,et al.Dynamic maintenance of rough approximations in multi-source hybrid information systems[J].Information Sciences,2020,530(8):108-127. [25]LIU D,LI T R,ZHANG J B.Incremental updating approximations in probabilistic rough sets under the variation of attributes[J].Knowledge-Based System,2015,73(1):81-96. [26]ZHANG Y Y,LI T R,LUO C,et al.Incremental updating of rough approximations in interval-valued information systems under attribute generalization[J].Information Sciences,2016,373(12):461-475. [27]UCI Machine Learning Repository[OL].http://archive.ics.uci.edu/ml/datasets.html. [28]Rosetta:A rough set toolkit for analysis of data[OL].http://www.lcb.uu.se/tools/rosetta/index.php. [29]MARIELLO A,BATTITI R.Feature selection based on theneighborhood entropy[J].IEEE Transactions on Neural Networks and Learning Systems,2018,29(12):6313-6322. [30]LIU Y,CAO J J,DIAO X C,et al.Survey on stability of feature selection[J].Journal of Software,2018,29(9):2559-2579. [31]FRIEDMAN M.A comparison of alternative tests of significance for the problem of m rankings[J].The Annals of Mathematical Statistics,1940,11(1):86-92. |
[1] | 李斌, 万源. 基于相似度矩阵学习和矩阵校正的无监督多视角特征选择 Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment 计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124 |
[2] | 刘冬梅, 徐洋, 吴泽彬, 刘倩, 宋斌, 韦志辉. 基于边框距离度量的增量目标检测方法 Incremental Object Detection Method Based on Border Distance Measurement 计算机科学, 2022, 49(8): 136-142. https://doi.org/10.11896/jsjkx.220100132 |
[3] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[4] | 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩. 混合改进的花授粉算法与灰狼算法用于特征选择 Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection 计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135 |
[5] | 储安琪, 丁志军. 基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理 Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation 计算机科学, 2022, 49(4): 134-139. https://doi.org/10.11896/jsjkx.210300075 |
[6] | 沈少朋, 马洪江, 张智恒, 周相兵, 朱春满, 温佐承. 多元时序上状态转移模式的三支漂移检测 Three-way Drift Detection for State Transition Pattern on Multivariate Time Series 计算机科学, 2022, 49(4): 144-151. https://doi.org/10.11896/jsjkx.210600045 |
[7] | 孙林, 黄苗苗, 徐久成. 基于邻域粗糙集和Relief的弱标记特征选择方法 Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief 计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094 |
[8] | 李宗然, 陈秀宏, 陆赟, 邵政毅. 鲁棒联合稀疏不相关回归 Robust Joint Sparse Uncorrelated Regression 计算机科学, 2022, 49(2): 191-197. https://doi.org/10.11896/jsjkx.210300034 |
[9] | 张叶, 李志华, 王长杰. 基于核密度估计的轻量级物联网异常流量检测方法 Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method 计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108 |
[10] | 杨蕾, 降爱莲, 强彦. 基于自编码器和流形正则的结构保持无监督特征选择 Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization 计算机科学, 2021, 48(8): 53-59. https://doi.org/10.11896/jsjkx.200700211 |
[11] | 侯春萍, 赵春月, 王致芃. 基于自反馈最优子类挖掘的视频异常检测算法 Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining 计算机科学, 2021, 48(7): 199-205. https://doi.org/10.11896/jsjkx.200800146 |
[12] | 胡艳梅, 杨波, 多滨. 基于网络结构的正则化逻辑回归 Logistic Regression with Regularization Based on Network Structure 计算机科学, 2021, 48(7): 281-291. https://doi.org/10.11896/jsjkx.201100106 |
[13] | 周钢, 郭福亮. 基于特征选择的高维数据集成学习方法研究 Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data 计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102 |
[14] | 李艳, 范斌, 郭劼, 林梓源, 赵曌. 基于k-原型聚类和粗糙集的属性约简方法 Attribute Reduction Method Based on k-prototypes Clustering and Rough Sets 计算机科学, 2021, 48(6A): 342-348. https://doi.org/10.11896/jsjkx.201000053 |
[15] | 丁思凡, 王锋, 魏巍. 一种基于标签相关度的Relief特征选择算法 Relief Feature Selection Algorithm Based on Label Correlation 计算机科学, 2021, 48(4): 91-96. https://doi.org/10.11896/jsjkx.200800025 |
|