Computer Science ›› 2018, Vol. 45 ›› Issue (11A): 427-430.

• Big Data & Data Mining • Previous Articles     Next Articles

Semi-supervised Feature Selection Algorithm Based on Information Entropy

WANG Feng, LIU Ji-chao, WEI Wei   

  1. School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
  • Online:2019-02-26 Published:2019-02-26

Abstract: In applications,since it is usually expensive to determine data labels,researchers can only mark a very small amount of data.Hence,on the basis of rough set theory and entropy,this paper proposed an entropy-based rough feature selection algorithm for the problem of “small labeled samples”.In the context of semi-supervised learning,entropy and feature significance were defined in this paper.On this basis,a new semi-supervised feature selection algorithm was proposed to deal with datasets which contain only small labels.Experimental results show that the new algorithm is feasible and efficiency

Key words: Feature selection, Information entropy, Semi-supervised, Small labeled data

CLC Number: 

  • TP181
[1]BLUM A L,LANGLEY P.Selection of relevant features and examples in machine learning [J].Artificial Intelligence,1997,97(1-2):245-271.
[2]DASH M,CHOI K,SCHEUERMANN P,et al.Feature selection for clustering-a filter solution[C]∥Proceedings of the Se-cond International Conference on Data Mining.2002:115-122.
[3]LIU H,YU L.Toward integrating feature selection algorithms for classification and clustering[J].IEEE Transaction on Knowledge and Data Engineering,2005,17(4):491-502.
[4]HU Q H,YU D R,LIU J F,et al.Neighborhood rough set based heterogeneous feature subset selection[J].Information Sciences,2008,178(18):3577-3594.
[5]HU Q H,XIE Z X,YU D R.Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation [J].Pattern Recognition,2007,40(12):3509-3521.
[6]CHEN H M,LI T R,QIAO S J,et al.A rough set based dyna-mic maintenance approach for approximations in coarsening and refining attribute values [J].International Journal of Intelligent Systems,2010,25(10):1005-1026.
[7]LIANG J Y,WANG F,DANG C Y,et al.An efficient rough feature selection algorithm with a multi-granulation view [J].International Journal of Approximate Reasoning,2012,53(6):912-926.
[8]LIU D,LI T R,RUAN D,et al.An incremental approach for inducing knowledge from dynamic information systems [J].Fundamenta Informaticae,2009,94(2):245- 260.
[9]LI T R,RUAN D,GEERT W,et al.A rough sets based characteristic relation approach for dynamic attribute Fgeneralization in data mining [J].Knowledge-Based Systems,2007,20(5):485-494.
[10]JING Y G,LI T R,HUANG J F,et al.An incremental attribute reduction approach based on knowledge granularity under the attribute generalization [J].International Journal of Approximate Reasoning,2016,76:80-95.
[11]JING Y G,LI T R,FUJITA H,et al.An incremental attribute reduction approach based on knowledge granularity with a multi-granulation view [J].Information Sciences,2017,411:23-38.
[12]JING Y G,LI T R,HUANG J F,et al.A group incremental reduction algorithm with varying data values [J].International Journal of Intelligent Systems,2016,32(9):900-925.
[13]杨明.一种基于改进差别矩阵的核增量式更新算法[J].计算机学报,2006,29(3):407-413.
[14]黄兵,周献中,张蓉蓉.基于信息量的不完备信息系统属性约简[J].系统工程理论与实践,2005,4(4):55-60.
[15]徐章艳,刘作鹏,杨炳儒,等.一个复杂度为max(O(|C||U|),O(|C|2|U/C|)) 的快速属性约简算法[J].计算机学报,2006,29(3):391-399.
[16]ZHAO M Y,JIAO L C,MA W P,et al.Classification and sa-liency detection by semi-supervised low-rank representation [J].Pattern Recognition,2016,51(C):281-294.
[17]BENABDESLEM K,HINDAWI M.Efficient Semi-supervised Feature Selection:Constraint,Relevance and Redundancy [J].IEEE Transactions on Knowledge and Data Engineering,2014,26(5):1131-1143.
[18]FORESTIER G,WEMMERT C.Semi-supervised learning using multiple clustering with limited labeled data [J].Information Sciences,2016,361-362(C):48-65.
[19]ZHAO Z,LIU H.Semi-supervised feature selection via spectral analysis[C]∥SIAM International Conference on Data Mining(SDM 2007).2007.
[20]NAKATANI Y,ZHU K,UEHARA K.Semi-supervised lear-ning using feature selection based on maximum density sub-graphs [J].Systems and Computers in Japan,2007,38(9):32-43.
[21]HANDL J,KNOWLES J.Semi-supervised feature selection via multi-objective optimization[C]∥The 2006 International Joint Conference on Neural Networks.2006.
[22]IZUTANI A,UEHARA K.A Modeling Approach Using Multiple Graphs for Semi-Supervised Learning[C]∥International Conference on Discovery Science.Springer-Verlag,2008:296-307.
[23]XU Z L,KING I,MICHAEL R-T L,et al.Discriminative Semi-Supervised Feature Selection Via Manifold Regularization [J].IEEE Transactions on Neural Networks,2010,21(7):1033-1046.
[24]REN J T,QIU Z Y,FAN W,et al.Forward semi-supervised feature selection[C]∥Proceedings of the 12th Pacific-Asia confe-rence on Advances in Knowledge Discovery and Data Mining(PAKDD’08).Berlin:Springer-Verlag,2008:970-976.
[25]王国胤,于洪,杨大春.基于条件信息熵的决策表约简[J].计算机学报,2002,25(7):759-766.
[26]LIANG J Y,CHIN K S,DANG C Y,et al.A new method for measuring uncertainty and fuzziness in rough set theory [J].International Journal of General Systems,2002,31(4):331-342.
[27]LIANG J Y,SHI Z Z,LI D Y,et al.The information entropy,rough entropy and knowledge granulation in incomplete information systems [J].International Journal of General Systems,2006,34(1):641-654.
[28]WANG F,LIANG J Y,QIAN Y H.Attribute reduction:a dimension incremental strategy [J].Knowledge-Based Systems,2013,39(2):95-108.
[29]LIANG J Y,WANG F,DANG C Y,et al.A group incremental approach to feature selection applying rough set technique [J].IEEE Transactions on Knowledge and Data Engineering,2014,26(2):294-308.
[30]王锋,魏巍.缺失数据数据集的组增量式特征选择[J].计算机科学,2015,42(7):285-290.
[31]刘薇,梁吉业,魏巍,等.一种基于条件熵的增量式属性约简求解算法[J].计算机科学,2011,38(1):229-231,239.
[32]QIAN Y H,LIANG J Y,PEDRYCZ W,et al.Positive approximation:an accelerator for attribute reduction in rough set theory[J].Artificial Intelligence,2010,174(9-10):597-618.
[1] LI Bin, WAN Yuan. Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment [J]. Computer Science, 2022, 49(8): 86-96.
[2] WU Hong-xin, HAN Meng, CHEN Zhi-qiang, ZHANG Xi-long, LI Mu-hang. Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning [J]. Computer Science, 2022, 49(8): 12-25.
[3] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[4] HOU Xia-ye, CHEN Hai-yan, ZHANG Bing, YUAN Li-gang, JIA Yi-zhen. Active Metric Learning Based on Support Vector Machines [J]. Computer Science, 2022, 49(6A): 113-118.
[5] KANG Yan, WANG Hai-ning, TAO Liu, YANG Hai-xiao, YANG Xue-kun, WANG Fei, LI Hao. Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection [J]. Computer Science, 2022, 49(6A): 125-132.
[6] WANG Yu-fei, CHEN Wen. Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment [J]. Computer Science, 2022, 49(6): 127-133.
[7] CHU An-qi, DING Zhi-jun. Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation [J]. Computer Science, 2022, 49(4): 134-139.
[8] SUN Lin, HUANG Miao-miao, XU Jiu-cheng. Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief [J]. Computer Science, 2022, 49(4): 152-160.
[9] XU Hua-jie, CHEN Yu, YANG Yang, QIN Yuan-zhuo. Semi-supervised Learning Method Based on Automated Mixed Sample Data Augmentation Techniques [J]. Computer Science, 2022, 49(3): 288-293.
[10] XIA Yuan, ZHAO Yun-long, FAN Qi-lin. Data Stream Ensemble Classification Algorithm Based on Information Entropy Updating Weight [J]. Computer Science, 2022, 49(3): 92-98.
[11] LI Zong-ran, CHEN XIU-Hong, LU Yun, SHAO Zheng-yi. Robust Joint Sparse Uncorrelated Regression [J]. Computer Science, 2022, 49(2): 191-197.
[12] HOU Hong-xu, SUN Shuo, WU Nier. Survey of Mongolian-Chinese Neural Machine Translation [J]. Computer Science, 2022, 49(1): 31-40.
[13] ZHANG Ye, LI Zhi-hua, WANG Chang-jie. Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method [J]. Computer Science, 2021, 48(9): 337-344.
[14] YANG Lei, JIANG Ai-lian, QIANG Yan. Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization [J]. Computer Science, 2021, 48(8): 53-59.
[15] HOU Chun-ping, ZHAO Chun-yue, WANG Zhi-peng. Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining [J]. Computer Science, 2021, 48(7): 199-205.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!