计算机科学 ›› 2019, Vol. 46 ›› Issue (2): 62-67.doi: 10.11896/j.issn.1002-137X.2019.02.010

• 大数据与数据科学 • 上一篇    下一篇

基于核函数的稀疏属性选择算法

张善文, 文国秋, 张乐园, 李佳烨   

  1. 广西师范大学计算机科学与信息工程学院广西多源信息挖掘与安全重点实验室 广西 桂林541004
  • 收稿日期:2018-08-03 出版日期:2019-02-25 发布日期:2019-02-25
  • 通讯作者: 文国秋(1987-),女,硕士,讲师,主要研究方向为基础数学、机器学习,E-mail:wenguoqiu2008@163.com
  • 作者简介:张善文(1991-),男,硕士生,主要研究方向为数据挖掘、机器学习;张乐园(1995-),男,硕士生,主要研究方向为数据挖掘、机器学习;李佳烨(1993-),男,硕士生,主要研究方向为数据挖掘、机器学习。
  • 基金资助:
    本文受国家自然科学基金(61170131,61263035,61573270,90718020),中国博士后基金(2015M570837),广西自然科学基金(2015GXNSFCB139011,2015GXNSFAA139306),国家重点研发计划资助项目(2016YFB1000905),广西科技基地与人才计划项目(Guike 541804573)资助。

Sparse Feature Selection Algorithm Based on Kernel Function

ZHANG Shan-wen, WEN Guo-qiu, ZHANG Le-yuan, LI Jia-ye   

  1. Guangxi Key Lab of Multi-source Information Mining & Security,College of Computer Science and Information Engineering,Guangxi Normal University,Guilin,Guangxi 541004,China
  • Received:2018-08-03 Online:2019-02-25 Published:2019-02-25

摘要: 鉴于传统属性选择算法无法捕捉属性之间的关系的问题,文中提出了一种非线性属性选择方法。该方法通过引入核函数,将原始数据集投影到高维的核空间,因在核空间内进行运算,进而可以考虑到数据属性之间的关系。由于核函数自身的优越性,即使数据通过高斯核投影到无穷维的空间中,计算复杂度亦可以控制得较小。在正则化因子的限制上,使用两种范数进行双重约束,不仅提高了算法的准确率,而且使得算法实验结果的方差仅为0.74,远小于其他同类对比算法,且算法更加稳定。在8个常用的数据集上将所提算法与6个同类算法进行比较,并用SVM分类器来测试分类准确率,最终该算法得到最少1.84%,最高3.27%,平均2.75%的提升。

关键词: L1范数, L2,1范数, 核函数, 稀疏, 属性选择

Abstract: In view of the condition that the traditional feature selection algorithm can not capture the relationship between features,a nonlinear feature selection method was proposed.By introducing a kernel function,the method projects the original data set into a high-dimensional kernel space,and considers the relationship between sample features by performing operations in the kernel space.Due to the superiority of the kernel function,even if the data are projected into the infinite dimensional space through the Gaussian kernel,the computational complexity can be controlled to a small extent.For the limitation of the regularization factor,the use of two norms for double constraint not only improves the accuracy of the algorithm,but also makes the variance of the algorithm only be 0.74,which is much smaller than other similar comparison algorithms,and it is more stable.6 similar algorithms were compared on 8 common data sets,and the SVM classifier was used to test the effect.The results demonstrate that the proposed algorithm can get the improvement by a minimum of 1.84%,a maximum of 3.27%,and an average of 2.75%.

Key words: L1-norm, L2,1-norm, Feature selection, Kernel function, Sparse

中图分类号: 

  • TP181
[1]ZHU X,SUK H I,SHEN D.Matrix-Similarity Based Loss Function and Feature Selection for Alzheimer’s Disease Diagnosis[C]∥Computer Vision and Pattern Recognition.IEEE,2014:3089-3096.
[2]GU Q,LI Z,HAN J.Joint feature selection and subspace lear- ning[C]∥International Joint Conference on Artificial Intelligence.AAAI Press,2011:1294-1299.
[3]ZHU X,HUANG Z,CHENG H,et al.Sparse hashing for fast multimedia search[J].Acm Transactions on Information Systems,2013,31(2):1-24.
[4]ZHU X,HUANG Z,YANG Y,et al.Self-taught dimensionality reduction on the high-dimensional small-sized data[J].Pattern Recognition,2013,46(1):215-229.
[5]PYATYKH S,HESSER J,ZHENG L.Image noise level estima- tion by principal component analysis[J].IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society,2013,22(2):687-699.
[6]KONIETSCHKE F,PAULY M.Bootstrapping and permuting paired t-test type statistics[J].Statistics & Computing,2014,24(3):283-296.
[7]LIIMATAINEN K,HEIKKILÄ R,YLIHARJA O,et al.Sparse logistic regression and polynomial modelling for detection of artificial drainage networks[J].Remote Sensing Letters,2015,6(4):311-320.
[8]BENABDESLEM K,HINDAWI M.Constrained laplacian score for semi-supervised feature selection[C]∥Machine Learning and Knowledge Discovery in Databases-European Conference Proceedings.DBLP,2011:204-218.
[9]ZHANG S,CHENG D,ZONG M,et al.Self-representation nearest neighbor search for classification[J].Neurocomputing,2016,195(C):137-142.
[10]DENG Z,ZHANG S,YANG L,et al.Sparse sample self-representation for subspace clustering[J].Neural Computing & Applications,2018,29(11):43-49.
[11]VARMA M,BABU B R.More generality in efficient multiple kernel learning[C]∥International Conference on Machine Learning.ACM,2009:1065-1072.
[12]COMANICIU D,RAMESH V,MEER P P.Kernel-Based Object Tracking[J].Pattern Analysis & Machine Intelligence,2003,25(5):564-575.
[13]GONG Y H,ZONG M,ZHU Y H,et al.Knn regression based on mixed-norm reconstruction [J].Computer Applications and Software,2016(2):232-236.(in Chinese)
龚永红,宗鸣,朱永华,等.基于混合模重构的kNN回归[J].计算机应用与软件,2016(2):232-236.
[14]WANG H,NIE F,HUANG H,et al.Sparse multi-task regression and feature selection to identify brain imaging predictors for memory performance[C]∥International Conference on Compu-ter Vision.2011:557.
[15]GU Q,LI Z,HAN J.Linear discriminant dimensionality reduction[C]∥Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Springer Berlin Heidelberg,2011:549-564.
[16]ZHU X,ZHANG L,HUANG Z.A sparse embedding and least variance encoding approach to hashing[J].IEEE Transactions on Image Processing,2014,23(9):3737-3750.
[17]ZHU X,SUK H I,SHEN D.A Novel Multi-relation Regularization Method for Regression and Classification in AD Diagnosis[C]∥International Conference on Medical Image Computing and Computer-Assisted Intervention.Springer International Publishing,2014:401-408.
[18]UCI repository of machine learning datasets [EB/OL].
[2016-05-27].http://archive.icsuci.edu/ml.
[19]NIE F,HUANG H,CAI X,et al.Efficient and robust feature selection via joint  2,1 -norms minimization[C]∥International Conference on Neural Information Processing Systems.Curran Associates Inc.,2010:1813-1821.
[20]CHANG X,NIE F,YANG Y,et al.A convex formulation for semi-supervised multi-label feature selection[C]∥Twenty-Eighth AAAI Conference on Artificial Intelligence.AAAI Press,2014:1171-1177.
[21]CAI D,ZHANG C,HE X.Unsupervised feature selection for multi-cluster data[C]∥ACM SIGKDD International Confe-rence on Knowledge Discovery and Data Mining.ACM,2010:333-342.
[22]YAMADA M,JITKRITTUM W,SIGAL L,et al.High-Dimensional Feature Selection by Feature-Wise Non-Linear Lasso[J].Neural Computation,2012,26(1):185-207.
[23]NIE F,ZHU W,LI X.Unsupervised feature selection with structured graph optimization[C]∥Thirtieth AAAI Conference on Artificial Intelligence.AAAI Press,2016:1302-1308.
[24]YANG Y,SHEN H T,MA Z,et al.l 2,1 -norm regularized discriminative feature selection for unsupervised learning[C]∥International Joint Conference on Artificial Intelligence.AAAI Press,2011:1589-1594.
[25]LIBSVM-ALibrary for Support Vector Machinces [EB/OL].
[2015-04-10].http://www/csie.ntu.edu.tw/~cjlin/libsvm.
[26]ZHAO Z,HE X,CAI D,et al.Graph Regularized Feature Selection with Data Reconstruction[J].IEEE Transactions on Knowledge & Data Engineering,2016,28(3):689-700.
[27]XUE H,SONG Y,XU H M.Multiple Indefinite Kernel Lear- ning for Feature Selection[C]∥Twenty-Sixth International Joint Conference on Artificial Intelligence.2017:3210-3216.
[1] 李霞, 马茜, 白梅, 王习特, 李冠宇, 宁博.
RIIM:基于独立模型的在线缺失值填补
RIIM:Real-Time Imputation Based on Individual Models
计算机科学, 2022, 49(8): 56-63. https://doi.org/10.11896/jsjkx.210600180
[2] 孙晓寒, 张莉.
基于评分区域子空间的协同过滤推荐算法
Collaborative Filtering Recommendation Algorithm Based on Rating Region Subspace
计算机科学, 2022, 49(7): 50-56. https://doi.org/10.11896/jsjkx.210600062
[3] 刘云, 董守杰.
基于CUDA核函数的多路视频图像拼接加速算法
Acceleration Algorithm of Multi-channel Video Image Stitching Based on CUDA Kernel Function
计算机科学, 2022, 49(6A): 441-446. https://doi.org/10.11896/jsjkx.210600043
[4] 汪晋, 刘江.
基于GPU的并行DILU预处理技术
GPU-based Parallel DILU Preconditioning Technique
计算机科学, 2022, 49(6): 108-118. https://doi.org/10.11896/jsjkx.210300259
[5] 王美玲, 刘晓楠, 尹美娟, 乔猛, 荆丽娜.
基于评论和物品描述的深度学习推荐算法
Deep Learning Recommendation Algorithm Based on Reviews and Item Descriptions
计算机科学, 2022, 49(3): 99-104. https://doi.org/10.11896/jsjkx.210200170
[6] 孙圣姿, 郭炳晖, 杨小博.
用于多模态语义分析的嵌入共识自动编码器
Embedding Consensus Autoencoder for Cross-modal Semantic Analysis
计算机科学, 2021, 48(7): 93-98. https://doi.org/10.11896/jsjkx.200600003
[7] 孙明玮, 司维超, 董琪.
基于多维度数据的网络服务质量的综合评估研究
Research on Comprehensive Evaluation of Network Quality of Service Based on Multidimensional Data
计算机科学, 2021, 48(6A): 246-249. https://doi.org/10.11896/jsjkx.200900131
[8] 马凤飞, 蔺素珍, 刘峰, 王丽芳, 李大威.
基于语义对比生成对抗网络的高倍欠采MRI重建
Semantic-contrast Generative Adversarial Network Based Highly Undersampled MRI Reconstruction
计算机科学, 2021, 48(4): 169-173. https://doi.org/10.11896/jsjkx.200600047
[9] 鲍志强, 陈卫东.
基于最大后验估计的谣言源定位器
Rumor Source Detection in Social Networks via Maximum-a-Posteriori Estimation
计算机科学, 2021, 48(4): 243-248. https://doi.org/10.11896/jsjkx.200400053
[10] 李培冠, 於志勇, 黄昉菀.
基于稀疏表示的电力负荷数据补全
Power Load Data Completion Based on Sparse Representation
计算机科学, 2021, 48(2): 128-133. https://doi.org/10.11896/jsjkx.191200152
[11] 胡蓉, 阳王东, 王昊天, 罗辉章, 李肯立.
基于GPU加速的并行WMD算法
Parallel WMD Algorithm Based on GPU Acceleration
计算机科学, 2021, 48(12): 24-28. https://doi.org/10.11896/jsjkx.210600213
[12] 徐兵, 弋沛玉, 王金策, 彭舰.
知识图谱嵌入的高阶协同过滤推荐系统
High-order Collaborative Filtering Recommendation System Based on Knowledge Graph Embedding
计算机科学, 2021, 48(11A): 244-250. https://doi.org/10.11896/jsjkx.210100211
[13] 邵政毅, 陈秀宏.
基于样本特征核矩阵的稀疏双线性回归
Sample Feature Kernel Matrix-based Sparse Bilinear Regression
计算机科学, 2021, 48(10): 185-190. https://doi.org/10.11896/jsjkx.200800219
[14] 田旭, 常侃, 黄升, 覃团发.
基于残差字典及协作表达的单图像超分辨率算法
Single Image Super-resolution Algorithm Using Residual Dictionary and Collaborative Representation
计算机科学, 2020, 47(9): 135-141. https://doi.org/10.11896/jsjkx.190600146
[15] 程中建, 周双娥, 李康.
基于多尺度自适应权重的稀疏表示目标跟踪算法
Sparse Representation Target Tracking Algorithm Based on Multi-scale Adaptive Weight
计算机科学, 2020, 47(6A): 181-186. https://doi.org/10.11896/JsJkx.190500093
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!