计算机科学 ›› 2022, Vol. 49 ›› Issue (8): 86-96.doi: 10.11896/jsjkx.210700124

• 数据库&大数据&数据科学* 上一篇    下一篇

基于相似度矩阵学习和矩阵校正的无监督多视角特征选择

李斌, 万源   

  1. 武汉理工大学理学院 武汉 430070
  • 收稿日期:2021-07-13 修回日期:2022-02-27 发布日期:2022-08-02
  • 通讯作者: 万源(wanyuan@whut.edu.cn)
  • 作者简介:(2859713954@qq.com)
  • 基金资助:
    中央高校基本科研业务费专项资金(2021III030JC)

Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment

LI Bin, WAN Yuan   

  1. School of Science,Wuhan University of Technology,Wuhan 430070,China
  • Received:2021-07-13 Revised:2022-02-27 Published:2022-08-02
  • About author:LI Bin,born in 1997,postgraduate.His main research interests include machine learning,pattern recognition and dimension reduction of high dimensional image features.
    WAN Yuan,born in 1976,Ph.D,professor,is a member of China Computer Federation.Her main research interests include machine learning,image processing and pattern recognition.
  • Supported by:
    Fundamental Research Funds for the Central Universities(2021III030JC).

摘要: 多视角特征选择通过融合多个视角的信息获取具有代表性的特征子集,来提高分类、聚类等学习任务的效率。然而,描述对象的特征繁杂多样且相互关联,单一地从原始特征中选择特征子空间可以简单地解决维度问题,但无法有效获取数据内部存在的结构信息和特征关联信息,且固定使用相似度矩阵和投影矩阵易损失视角间的相关性。针对以上问题,提出了基于相似度矩阵学习和矩阵校正的无监督多视角特征选择(SMLMA)算法。该算法首先构造所有视角的相似度矩阵,通过流形学习得到一致相似度矩阵以及投影矩阵,最大程度地发现和保留多视角数据的结构信息;其次采用矩阵校正的方法,最大化相似度矩阵和核矩阵之间的相关性,合理利用不同视角之间的关联性,减少特征子集的信息冗余;最后,采用Armijo搜索方法快速得到收敛结果。在4个实验数据集Caltech-7,NUS-WIDE-OBJ,Toy Animal和MSRC-v1上的实验结果表明,相比单视角特征选择和部分多视角特征选择方法,所提算法在聚类任务上的准确率平均提高了约7.54%。其较好地保留了数据的结构信息和多视角之间特征的相关性,捕获了更多高质量的特征。

关键词: 多视角, 矩阵校正, 特征选择, 无监督, 相似度矩阵

Abstract: Multi-view feature selection improves the efficiency of classification,clustering and other learning tasks by fusing information from multiple views to obtain representative feature subsets.However,the features of different views that describe objects are complex and interrelated.Simply searching subset of features from original space partly solves the problem of dimension,but it barely obtains the latent structural information and association information among features.Besides,using fixed similarity matrix and projection matrix is prone to lose the correlation between different views.To solve these problems,an unsupervised multi-view feature selection algorithm based on similarity matrix learning and matrix alignment(SMLMA)is proposed.Firstly,the similarity matrix based on all views is constructed,and the consistent similarity matrix and projection matrix are obtained by mani-fold learning,to explore and reserve the structural information of data to the greatest extent.Then,the matrix alignment method is used to maximize the correlation between the similarity matrix and the kernel matrix,for the purpose of using the correlation between different views and reducing the information redundancy of feature subset.Finally,the Armijo searching method is introduced to obtain the convergence result quickly.Experimental results on four datasets(Caltech-7,NUS-WIDE-OBJ,Toy Animal and MSRC-v1)show that,compared with single view feature selection and some multi-view feature selection methods,the accuracy of SMLMA is averagely improved by about 7.54%.The proposed algorithm well retains the structural information of data and the correlation between multi-view features,and captures more high-quality features.

Key words: Feature selection, Matrix alignment, Multi-view, Similarity matrix, Unsupervised

中图分类号: 

  • TP181
[1]JAMIESON K,BALAKRISHNAN H,TAY Y C.Sift:A MAC Protocol for Event-Driven Wireless Sensor Networks [C]//European Workshop on Wireless Sensor Networks.Berlin:Sprin-ger,2006:260-275.
[2]WANG X,HAN T X,YAN S.An HOG-LBP human detectorwith partial occlusion handling [C]//2009 IEEE 12th International Conference on Computer Vision.IEEE,2009:32-39.
[3]TAN X,TRIGGS B.Fusing Gabor and LBP Feature Sets for Kernel-Based Face Recognition [C]//International Workshop on Analysis and Modeling of Faces and Gestures.Berlin:Sprin-ger,2007:235-249.
[4]LI L,CAI M.Drug target prediction by multi-view low rank embedding[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2017,16(5):1712-1721.
[5]ZHANG H,WU D,NIE F,et al.Multilevel projections withadaptive neighbor graph for unsupervised multi-view feature selection [J].Information Fusion,2020,70(3):129-140.
[6]SUN S.A survey of multi-view machine learning [J].NeuralComputing and Applications,2013,23(7):2031-2038.
[7]ZHAO J,XIE X,XU X,et al.Multi-view learning overview:Recent progress and new challenges [J].Information Fusion,2017,38:43-54.
[8]XIE X.Regularized multi-view least squares twin support vector machines [J].Applied Intelligence,2018,48(9):3108-3115.
[9]LI X,ZHANG H,WANG R,et al.Multi-view Clustering:AScalable and Parameter-free Bipartite Graph Fusion Method [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,44(1):330-344.
[10]XIE X,SUN S.General multi-view learning with maximum entropy discrimination [J].Neurocomputing,2019,332:184-192.
[11]YIN J,SUN S.Multiview Uncorrelated Locality Preserving Projection[J].IEEE Transactions on Neural Networks and Lear-ning Systems,2020,31(9):3442-3455.
[12]LIN G F,ZHU H,FAN C X,et al.Multi-cluster Feature Selection Based on Grassmann Manifold [J].Computer Engineering,2012,16:3511-3518.
[13]WAN Y,CHEN X,ZHANG J.Global and intrinsic geometricstructure embedding for unsupervised feature selection [J].Expert Systems with Applications,2018,93(March):134-142.
[14]HE X,CAI D,NIYOGI P.Laplacian Score for Feature Selection [C]//Advances in Neural Information Processing Systems.2005:1-8.
[15]ZHAO Z,LIU H.Spectral feature selection for supervised and unsupervised learning [C]//Proceedings of the 24th International Conference on Machine learning.Association for Computing Machinery,2007:1151-1157.
[16]ZHAO Z,WANG L,LIU H.Efficient spectral feature selection with minimum redundancy [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2010:1-6.
[17]HE X.Locality preserving projections [J].Advances in NeuralInformation Processing Systems,2003,16(1):186-197.
[18]BELKIN M,NIYOGI P.Laplacian Eigenmaps for Dimensionality Reduction and Data Representation [J].Neural Computation,2003,15(6):1373-1396.
[19]LIU X,WANG L,ZHANG J,et al.Global and local structurepreservation for feature selection [J].IEEE Transactions on Neural Networks and Learning Systems,2013,25(6):1083-1095.
[20]DU L,SHEN Y D.Unsupervised feature selection with adaptive structure learning [C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2015:209-218.
[21]FENG Y,XIAO J,ZHUANG Y,et al.Adaptive UnsupervisedMulti-view Feature Selection for Visual Concept Recognition [C]//Asian Conference on Computer Vision.Berlin:Springer,2012:343-357.
[22]LI J,HU X,TANG J,et al.Unsupervised streaming feature selection in social media [C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge Mana-gement.2015:1041-1050.
[23]WANG Z,FENG Y F,TIAN Q,et al.Adaptive multi-view feature selection for human motion retrieval [J].Signal Proces-sing,2016,100(120):691-701.
[24]HOU C,NIE F,TAO H,et al.Multi-View Unsupervised Feature Selection with Adaptive Similarity and View Weight [J].IEEE Transactions on Knowledge and Data Engineering,2017,29(9):1998-2011.
[25]SHAO W,HE L,LU C T,et al.Online Unsupervised Multi-view Feature Selection [J].2016 IEEE 16th International Conference on Data Mining(ICDM).IEEE,2016:1203-1208.
[26]DING Z,FU Y.Low-rank common subspace for multi-viewlearning [C]//2014 IEEE International Conference on Data Mining.IEEE,2014:110-119.
[27]WAN Y,SUN S Z,ZENG C,et al.Adaptive Similarity Embedding For Unsupervised Multi-View Feature Selection [J].IEEE Transactions on Knowledge and Data Engineering,2020,33(10):3338-3350.
[28]XU C,TAO D,XU C,et al.Multi-View Intact Space Learning [J].IEEE Transactions on Pattern Analysis aNd Machine Intelligence,2015,37(12):2531-2544.
[29]TANG C,CHEN J,LIU X,et al.Consensus Learning GuidedMulti-view Unsupervised Feature Selection [J].Knowledge-Based Systems,2018,160:49-60.
[30]DONG X,ZHU L,SONG X,et al.Adaptive collaborative similarity learning for unsupervised multi-view feature selection [C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence.2018:2064-2070.
[31]SHI C,GU Z,DUAN C,et al.Multi-view Adaptive Semi-supervised Feature Selection with the self-paced learning [J].Signal Processing,2020,168:107332.
[32]ZHENG X,CHEN J,TANG C,et al.Single-Cell RNA-Sequencing Data Clustering via Locality Preserving Kernel Matrix Alignment[J].IEEE Access,2020,8:201577-201594.
[33]HUANG F,WU Z Z.Analysis and comparison of several conjugate gradient methods based on Armijo search step[J].Journal of Chengdu Institute of Information Engineering,2019,34(2):209-215.
[34]ZHANG R,NIE F,LI X,et al.Feature selection with multi-view data:A survey [J].Information Fusion,2019,50:158-167.
[35]TANG C,ZHU X,LIU X,et al.Cross-View Local Structure Preserved Diversity and Consensus Learning for Multi-View Unsupervised Feature Selection [C]//The AAAI Conference on Artificial Intelligence.2019:5101-5108.
[36]XB A,LEI Z A,CHENG L A,et al.Multi-view feature selection via Nonnegative Structured Graph Learning [J].Neurocompu-ting,2020,387:110-122.
[37]STREHL A,GHOSH J,CARDIE C,et al.Cluster Ensembles:A Knowledge Reuse Framework for Combining Multiple Partitions [J].Journal of Machine Learning Research,2002,3(3):583-617.
[38]YE X Y,YE X Y,ZHOU H.Feature selection based on in-fluence community detection and ant colony algorithm[J].Computer Engineering and Design,2019,40(9):2684-2691.
[1] 宋杰, 梁美玉, 薛哲, 杜军平, 寇菲菲.
基于无监督集群级的科技论文异质图节点表示学习方法
Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level
计算机科学, 2022, 49(9): 64-69. https://doi.org/10.11896/jsjkx.220500196
[2] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[3] 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩.
混合改进的花授粉算法与灰狼算法用于特征选择
Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection
计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135
[4] 陈佳舟, 赵熠波, 徐阳辉, 马骥, 金灵枫, 秦绪佳.
三维城市场景中的小物体检测
Small Object Detection in 3D Urban Scenes
计算机科学, 2022, 49(6): 238-244. https://doi.org/10.11896/jsjkx.210400174
[5] 储安琪, 丁志军.
基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理
Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation
计算机科学, 2022, 49(4): 134-139. https://doi.org/10.11896/jsjkx.210300075
[6] 孙林, 黄苗苗, 徐久成.
基于邻域粗糙集和Relief的弱标记特征选择方法
Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief
计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094
[7] 冷佳旭, 谭明圮, 胡波, 高新波.
基于隐式视角转换的视频异常检测
Video Anomaly Detection Based on Implicit View Transformation
计算机科学, 2022, 49(2): 142-148. https://doi.org/10.11896/jsjkx.210900266
[8] 李宗然, 陈秀宏, 陆赟, 邵政毅.
鲁棒联合稀疏不相关回归
Robust Joint Sparse Uncorrelated Regression
计算机科学, 2022, 49(2): 191-197. https://doi.org/10.11896/jsjkx.210300034
[9] 侯宏旭, 孙硕, 乌尼尔.
蒙汉神经机器翻译研究综述
Survey of Mongolian-Chinese Neural Machine Translation
计算机科学, 2022, 49(1): 31-40. https://doi.org/10.11896/jsjkx.210900006
[10] 宁秋怡, 史小静, 段湘煜, 张民.
基于风格感知的无监督领域适应算法
Unsupervised Domain Adaptation Based on Style Aware
计算机科学, 2022, 49(1): 271-278. https://doi.org/10.11896/jsjkx.201200094
[11] 张叶, 李志华, 王长杰.
基于核密度估计的轻量级物联网异常流量检测方法
Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method
计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108
[12] 杨蕾, 降爱莲, 强彦.
基于自编码器和流形正则的结构保持无监督特征选择
Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization
计算机科学, 2021, 48(8): 53-59. https://doi.org/10.11896/jsjkx.200700211
[13] 侯春萍, 赵春月, 王致芃.
基于自反馈最优子类挖掘的视频异常检测算法
Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining
计算机科学, 2021, 48(7): 199-205. https://doi.org/10.11896/jsjkx.200800146
[14] 胡艳梅, 杨波, 多滨.
基于网络结构的正则化逻辑回归
Logistic Regression with Regularization Based on Network Structure
计算机科学, 2021, 48(7): 281-291. https://doi.org/10.11896/jsjkx.201100106
[15] 周钢, 郭福亮.
基于特征选择的高维数据集成学习方法研究
Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data
计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!