Computer Science ›› 2022, Vol. 49 ›› Issue (11): 98-108.doi: 10.11896/jsjkx.210900076

• Database & Big Data & Data Science • Previous Articles     Next Articles

Incremental Feature Selection Algorithm for Dynamic Partially Labeled Hybrid Data

YAN Zhen-chao, SHU Wen-hao, XIE Xin   

  1. School of Information Engineering,East China Jiaotong University,Nanchang 330013,China
  • Received:2021-09-09 Revised:2021-12-28 Online:2022-11-15 Published:2022-11-03
  • About author:YAN Zhen-chao,born in 1997,postgraduate.His main research interests include granular computing,knowledge discovery,data mining,etc.
    SHU Wen-hao,born in 1985,Ph.D,associate professor,master supervisor.Her main research interests include data mining,knowledge discovery,etc.
  • Supported by:
    National Natural Science Foundation of China(61662023,61762037) and Natural Science Foundation of Jiangxi Province(20202BABL202037).

Abstract: Many real-world data sets are hybrid data consisting of symbolic,numerical and missing features.For the decision labels of hybrid data,it costs much labor and it is expensive to acquire the decision labels of all data,thus the partially labeled data is generated.Meanwhile,the data in real-world applications change dynamically,i.e.,the feature set is added into and deleted from the feature sets dynamically with different requirements.In this paper,according to the characteristics of high-dimensional,partial labeled and dynamic for the hybrid data,the incremental feature selection algorithms are proposed.Firstly,the information granularity is used to analyze the feature significance for partially labeled hybrid data.Then,the incremental updating mechanisms for information granularity are proposed with the variation of a feature set.On this basis,the incremental feature selection algorithms are proposed for the partially labeled hybrid data.Finally,extensive experimental results on UCI data set demonstrate that the proposed algorithms are feasible and efficient.

Key words: Hybrid data, Partially labeled, Incremental learning, Information granularity, Feature selection

CLC Number: 

  • TP391
[1]WANG C Z,HUANG Y,SHAO M W,et al.Feature selection based on neighborhood self-information[J].IEEE Transactions on Cybernetics,2019,99(7):1-12.
[2]WANG Q,QIAN Y H,LIANG X Y,et al.Local neighborhood rough set[J].Knowledge-Based Systems,2018,153(8):53-64.
[3]WANG D,CHEN H M,LI T R,et al.A novel quantum grasshopper optimization algorithm for feature selection[J].International Journal of Approximate Reasoning,2020,127(12):122-150.
[4]PAWLAK Z.Rough sets[J].International Journal of Computer and Information Sciences,1982,11(5):341-356.
[5]ZHENG N,WANG J Y.Evidence characteristics and attribute reduction of incomplete ordered information system[J].Computer Engineering and Applications,2018,54(21):43-47.
[6]JIANG Z H,LIU K Y,YANG X B,et al.Accelerator for supervised neighborhood based attribute reduction[J].International Journal of Approximate Reasoning,2020,119(4):122-150.
[7]WAN Y,CHEN X L,ZHANG J H,et al.Semi-supervised feature selection based on low-rank sparse graph embedding[J].Journal of Image and Graphics,2018,23(9):1316-1325.
[8]LIU K Y,YANG X B,YU H L,et al.Supervised information granulation strategy for attribute reduction[J].International Journal of Machine Learning and Cybernetics,2020,11(3):2149-2163.
[9]HU Q H,XIE Z X,YU D R.Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation[J].Pattern Recognition,2007,40(12):3509-3521.
[10]JING Y G,LI T R,FUJITA H,et al.An incremental attribute reduction method for dynamic data mining[J].Information Sciences,2018,465(7):202-218.
[11]WEI W,LIANG J Y,QIAN Y H.A comparative study of rough sets for hybrid data[J].Information Sciences,2012,190(6):1-16.
[12]WANG F,LIU J C,WEI W.Semi-supervised feature selectionalgorithm based on information entropy[J].Computer Science,2018,45(11):427-430.
[13]DAI J H,HU Q H,ZHANG J H,et al.Attribute selection for partially labeled categorical data by rough set approach[J].IEEE Transactions on Cybernetics,2017,47(9):2460-2471.
[14]LIU K Y,YANG X B,YU H L,et al.Rough set based semi-supervised feature selection via ensemble selector[J].Knowledge-Based Systems,2019,165(1):282-296.
[15]XIAO L S,WANG H J,YANG Y.Semi-supervised feature selection based on attribute dependency and hybrid constraint[J].Journal of Computer Applications,2015,35(12):80-84.
[16]MA F M,DING M W,ZHANG T F,et al.Compressed binary discernibility matrix based incremental attribute reduction algorithm for group dynamic data[J].Neurocomputing,2019,334(6):20-27.
[17]SHU W H,QIAN W B,XIE Y H.Incremental approaches for feature selection from dynamic data with the variation of multiple objects[J].Knowledge-Based System,2019,163(1):320-331.
[18]HUANG Q Q,LI T R,HUANG Y Y,et al.Incremental three-way neighborhood approach for dynamic incomplete hybrid data[J].Information Sciences,2020,541(12):98-122.
[19]LIU Y,ZHENG L D,XIU Y L,et al.Discernibility matrix based incremental feature selection on fused decision tables[J].International Journal of Approximate Reasoning,2020,118(3):1-26.
[20]ZENG A P,LI T R,LIU D,et al.A fuzzy rough set approach for incremental feature selection on hybrid information systems[J].Fuzzy Sets and Systems,2015,258(6):39-60.
[21]YU J H,CHEN M H,XU W H.Dynamic computing rough approximationsapproach to time-evolving information granule interval-valued ordered information system[J].Applied Soft Computing,2017,60(6):18-29.
[22]CAI M J,LANG G M,FUJITA H,et al.Incremental approaches to updating reducts under dynamic covering granularity[J].Knowledge-Based Systems,2019,172(1):130-140.
[23]WANG S,LI T R,LUO C,et al.A novel approach for efficient updating approximations in dynamic ordered information systems[J].Information Sciences,2020,507(8):197-219.
[24]HUANG Y Y,LI T R,LUO C,et al.Dynamic maintenance of rough approximations in multi-source hybrid information systems[J].Information Sciences,2020,530(8):108-127.
[25]LIU D,LI T R,ZHANG J B.Incremental updating approximations in probabilistic rough sets under the variation of attributes[J].Knowledge-Based System,2015,73(1):81-96.
[26]ZHANG Y Y,LI T R,LUO C,et al.Incremental updating of rough approximations in interval-valued information systems under attribute generalization[J].Information Sciences,2016,373(12):461-475.
[27]UCI Machine Learning Repository[OL].http://archive.ics.uci.edu/ml/datasets.html.
[28]Rosetta:A rough set toolkit for analysis of data[OL].http://www.lcb.uu.se/tools/rosetta/index.php.
[29]MARIELLO A,BATTITI R.Feature selection based on theneighborhood entropy[J].IEEE Transactions on Neural Networks and Learning Systems,2018,29(12):6313-6322.
[30]LIU Y,CAO J J,DIAO X C,et al.Survey on stability of feature selection[J].Journal of Software,2018,29(9):2559-2579.
[31]FRIEDMAN M.A comparison of alternative tests of significance for the problem of m rankings[J].The Annals of Mathematical Statistics,1940,11(1):86-92.
[1] LI Bin, WAN Yuan. Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment [J]. Computer Science, 2022, 49(8): 86-96.
[2] LIU Dong-mei, XU Yang, WU Ze-bin, LIU Qian, SONG Bin, WEI Zhi-hui. Incremental Object Detection Method Based on Border Distance Measurement [J]. Computer Science, 2022, 49(8): 136-142.
[3] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[4] KANG Yan, WANG Hai-ning, TAO Liu, YANG Hai-xiao, YANG Xue-kun, WANG Fei, LI Hao. Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection [J]. Computer Science, 2022, 49(6A): 125-132.
[5] CHU An-qi, DING Zhi-jun. Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation [J]. Computer Science, 2022, 49(4): 134-139.
[6] SHEN Shao-peng, MA Hong-jiang, ZHANG Zhi-heng, ZHOU Xiang-bing, ZHU Chun-man, WEN Zuo-cheng. Three-way Drift Detection for State Transition Pattern on Multivariate Time Series [J]. Computer Science, 2022, 49(4): 144-151.
[7] SUN Lin, HUANG Miao-miao, XU Jiu-cheng. Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief [J]. Computer Science, 2022, 49(4): 152-160.
[8] LI Zong-ran, CHEN XIU-Hong, LU Yun, SHAO Zheng-yi. Robust Joint Sparse Uncorrelated Regression [J]. Computer Science, 2022, 49(2): 191-197.
[9] ZHANG Ye, LI Zhi-hua, WANG Chang-jie. Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method [J]. Computer Science, 2021, 48(9): 337-344.
[10] YANG Lei, JIANG Ai-lian, QIANG Yan. Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization [J]. Computer Science, 2021, 48(8): 53-59.
[11] HU Yan-mei, YANG Bo, DUO Bin. Logistic Regression with Regularization Based on Network Structure [J]. Computer Science, 2021, 48(7): 281-291.
[12] HOU Chun-ping, ZHAO Chun-yue, WANG Zhi-peng. Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining [J]. Computer Science, 2021, 48(7): 199-205.
[13] ZHOU Gang, GUO Fu-liang. Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data [J]. Computer Science, 2021, 48(6A): 250-254.
[14] LI Yan, FAN Bin, GUO Jie, LIN Zi-yuan, ZHAO Zhao. Attribute Reduction Method Based on k-prototypes Clustering and Rough Sets [J]. Computer Science, 2021, 48(6A): 342-348.
[15] DING Si-fan, WANG Feng, WEI Wei. Relief Feature Selection Algorithm Based on Label Correlation [J]. Computer Science, 2021, 48(4): 91-96.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!