计算机科学 ›› 2020, Vol. 47 ›› Issue (6A): 480-484.doi: 10.11896/JsJkx.20190800095

• 数据库 & 大数据 & 数据科学 • 上一篇    下一篇

改进的局部和相似性保持特征选择算法

李金霞1, 赵志刚1, 李强1, 吕慧显2, 李明生1   

  1. 1 青岛大学计算机科学技术学院 山东 青岛 266071;
    2 青岛大学自动化与电气工程学院 山东 青岛 266071
  • 发布日期:2020-07-07
  • 通讯作者: 赵志刚(zhaolhx@163.com)
  • 作者简介:lJx7130@163.com
  • 基金资助:
    国家重点研发项目(2017YFB0203102)

Improved Locality and Similarity Preserving Feature Selection Algorithm

LI Jin-xia1, ZHAO Zhi-gang1, LI Qiang1, LV Hui-xian2 and LI Ming-sheng1   

  1. 1 College of Computer Science and Technology,Qingdao University,Qingdao,Shandong 266071,China
    2 College of Automation and Electrical Engineering,Qingdao University,Qingdao,Shandong 266071,China
  • Published:2020-07-07
  • About author:LI Jin-xia, born in 1994, postgraduate.Her main research interests include machine learning and so on.
    ZHAO Zhi-gang, born in 1973, professor, is a member of China Computer Federation.His main research interests include image processing, machine learning and compressed sensing.
  • Supported by:
    This work was supported by the National Key Research and Development Program of China (2017YFB0203102).

摘要: LSPE(Locality and Similarity Preserving Embedding)特征选择算法首先基于KNN定义图结构来保持数据的局部性,再基于定义图学习数据的低维重构系数来保持数据的局部性和相似性。两个步骤独立进行,缺乏交互。由于近邻个数是人为定义的,使得学习到的图结构不具备自适应的近邻,不是最优的,进而影响算法性能。为优化LSPE算法的性能,提出改进的局部和相似性保持特征选择算法,将图学习与稀疏重构、特征选择并入同一个框架,使得图学习和稀疏编码同时进行,其要求编码过程是稀疏的,自适应近邻的和非负的。所提算法旨在寻找一个能保持数据的局部性和相似性的投影,并对投影矩阵施加l2,1范数,进而选择能够保持局部性和相似性的相关特征。实验结果表明,改进后的算法减少了主观人为影响,消除了选择特征的不稳定性,对数据噪声鲁棒性更强,提高了图像分类的准确率。

关键词: 局部和相似性保持, 特征选择, 无监督学习, 稀疏重构

Abstract: LSPE (Locality and similarity preserving embedding) feature selection algorithm firstly maintains the locality of the data based on the pre-defined graph structure of the KNN,and then maintains the locality and similarity of the data based on the low-dimensional reconstruction coefficients that define the learning data of the graph.The two steps are independent and lack of interaction.Since the number of nearest neighbors is artificially defined,the learned graph structure does not have adaptive nearest neighbors and is not optimal,which will affect the performance of the algorithm.In order to optimize the performance of LSPE,an improved locality and similarity preserving feature selection algorithm is proposed.The proposed algorithm incorporates graph learning,sparse reconstruction and feature selection into the same framework,making graph learning and sparse coding are carried out simultaneously.The coding process is required to to be sparse,adaptive neighbor and non-negative.The goal is to find a proJection that can maintain the locality and similarity of the data,and apply a l2,1-norm to the proJection matrix,and then select the relevant features that can maintain locality and similarity.Experimental results show that the improved algorithm reduces the subJective influence,eliminates the instability of selecting features,is more robust to data noise,and improves the accuracy of ima-ge classification.

Key words: Feature selection, Locality and similarity preserving, Sparse reconstruction, Unsupervised learning

中图分类号: 

  • TP391.4
[1] LI T,MENG Z,NI B,et al.Robust Geometric p-norm Feature Pooling for Image Classification and Action Recognition.Ima-ge & Vision Computing,2016,55(P2):64-76.
[2] ZHAO Z,LIU H.Semi-supervised feature selection via spectral analysis.//Proceedings of the 2007 SIAM International Conference on Data Mining.Minneapolis,Minnesota:SIAM,2007:26-28.
[3] LIU Y,NIE F,WU J,et al.Efficient semi-supervised feature selection with noise insensitive trace ratio criterion.Neurocomputing,2013,105(Complete):12-18.
[4] CHANG X,NIE F P,YANG Y,et al.A convex formulation for semi-supervised multi-label feature selection//Proc of the 28th AAAI Conference on Artificial Intelligence.2014:1171-1177.
[5] ROWEIS S T.Nonlinear Dimensionality Reduction by Locally Linear Embedding.Science,2000,290(5500):2323-2326.
[6] BELKIN M,NIYOGI P.LaplacianEigenmaps for Dimensionality Reduction and Data Representation.MIT Press,2003.
[7] BENGIO Y,VINCENT P.Out-of-Sample Extensions for LLE,Isomap,MDS,Eigenmaps,and Spectral Clustering//International Conference on Neural Information Processing Systems.MIT Press,2003.
[8] HE X,YAN S,HU Y,et al.Face Recognition Using Laplacian Faces.IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(3):328-340.
[9] HE X.Neighborhood preserving embedding//Tenth IEEE International Conference on Computer Vision.2005:1208-1213.
[10] QIAO L,CHEN S,TAN X.Sparsity preserving proJections with applications to face recognition.Pattern Recognition,2010,43(1):331-341.
[11] KAI Y,TONG Z,GONG Y.Nonlinear Learning Using Local Coordinate Coding//Advances in Neural Information Processing Systems.2009:2223-2231.
[12] FANG X,XU Y,LI X,et al.Locality and similarity preserving embedding for feature selection.Neurocomputing,2014,128:304-315.
[13] WANG J,YANG J,KAI Y,et al.Locality-constrained Linear Coding for image classification//Computer Vision & Pattern Recognition.2010:3360-3367.
[14] FANG X,XU Y,LI X,et al.Learning a Nonnegative Sparse Graph for Linear Regression.IEEE Transactions on Image Processing,2015,24(9):2760-2771
[15] YANG J,ZHANG Y.Alternating Direction Algorithms forL1-Problems in Compressive Sensing.SIAM Journal on Scie-ntific Computing,2011,33(1):250-278.
[16] FUKUNAGA,KEINOSUKE.Introduction to statistical pattern recognition.Academic Press,1972.
[17] NIE F,HUANG H,CAI X,et al.Efficient and Robust Feature Selection via Joint 2,1-Norms Minimization//Advances in Neural Information Processing Systems.2010:1813-1821.
[18] HAN Y,XU Z,MA Z,et al.Image classification with manifold learning for out-of-sample data.Signal Processing,2013,93(8):2169-2177.
[19] YAN F,WANG X D.A semi-supervised feature selection me-thod based on local discriminant constraint.Pattern Recognition AndArtificial Intelligence,2017,30(1):89-95.
[1] 宋杰, 梁美玉, 薛哲, 杜军平, 寇菲菲.
基于无监督集群级的科技论文异质图节点表示学习方法
Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level
计算机科学, 2022, 49(9): 64-69. https://doi.org/10.11896/jsjkx.220500196
[2] 李斌, 万源.
基于相似度矩阵学习和矩阵校正的无监督多视角特征选择
Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment
计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124
[3] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[4] 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩.
混合改进的花授粉算法与灰狼算法用于特征选择
Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection
计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135
[5] 储安琪, 丁志军.
基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理
Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation
计算机科学, 2022, 49(4): 134-139. https://doi.org/10.11896/jsjkx.210300075
[6] 孙林, 黄苗苗, 徐久成.
基于邻域粗糙集和Relief的弱标记特征选择方法
Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief
计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094
[7] 李宗然, 陈秀宏, 陆赟, 邵政毅.
鲁棒联合稀疏不相关回归
Robust Joint Sparse Uncorrelated Regression
计算机科学, 2022, 49(2): 191-197. https://doi.org/10.11896/jsjkx.210300034
[8] 侯宏旭, 孙硕, 乌尼尔.
蒙汉神经机器翻译研究综述
Survey of Mongolian-Chinese Neural Machine Translation
计算机科学, 2022, 49(1): 31-40. https://doi.org/10.11896/jsjkx.210900006
[9] 张叶, 李志华, 王长杰.
基于核密度估计的轻量级物联网异常流量检测方法
Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method
计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108
[10] 杨蕾, 降爱莲, 强彦.
基于自编码器和流形正则的结构保持无监督特征选择
Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization
计算机科学, 2021, 48(8): 53-59. https://doi.org/10.11896/jsjkx.200700211
[11] 侯春萍, 赵春月, 王致芃.
基于自反馈最优子类挖掘的视频异常检测算法
Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining
计算机科学, 2021, 48(7): 199-205. https://doi.org/10.11896/jsjkx.200800146
[12] 胡艳梅, 杨波, 多滨.
基于网络结构的正则化逻辑回归
Logistic Regression with Regularization Based on Network Structure
计算机科学, 2021, 48(7): 281-291. https://doi.org/10.11896/jsjkx.201100106
[13] 周钢, 郭福亮.
基于特征选择的高维数据集成学习方法研究
Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data
计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102
[14] 丁思凡, 王锋, 魏巍.
一种基于标签相关度的Relief特征选择算法
Relief Feature Selection Algorithm Based on Label Correlation
计算机科学, 2021, 48(4): 91-96. https://doi.org/10.11896/jsjkx.200800025
[15] 滕俊元, 高猛, 郑小萌, 江云松.
噪声可容忍的软件缺陷预测特征选择方法
Noise Tolerable Feature Selection Method for Software Defect Prediction
计算机科学, 2021, 48(12): 131-139. https://doi.org/10.11896/jsjkx.201000168
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!