计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 240500021-8.doi: 10.11896/jsjkx.240500021

• 大数据&数据科学 • 上一篇    下一篇

基于概率模型与信息熵的局部线性嵌入算法

刘远红, 毋毓斌   

  1. 东北石油大学电气信息工程学院 黑龙江 大庆 163318
  • 出版日期:2025-06-16 发布日期:2025-06-12
  • 通讯作者: 刘远红(liuyuanhong@nepu.edu.cn)
  • 作者简介:(liuyuanhong@nepu.edu.cn)
  • 基金资助:
    海南省自然科学基金(623MS071)

Local Linear Embedding Algorithm Based on Probability Model and Information Entropy

LIU Yuanhong, WU Yubin   

  1. School of Electrical Engineering & Information,Northeast Petroleum University,Daqing,Heilongjiang 163318,China
  • Online:2025-06-16 Published:2025-06-12
  • About author:LIU Yuanhong,born in 1979,professor.His main research interests include nonlinear dimension reduction,machine learning and pattern recognition,signal processing.
    WU Yubin,born in 2000,postgraduate.Her main research interests include data dimensionality reduction,manifold learning and pattern recognition.
  • Supported by:
    Natural Science Foundation of Hainan Province China(623MS071).

摘要: 局部线性嵌入算法采用欧氏距离选择邻域点,这通常会损失数据集本身的非线性特征,造成邻域点选取错误,且仅使用欧氏距离构造权重会导致信息挖掘不充分。针对以上问题,提出基于概率模型与信息熵的局部线性嵌入算法(Probability information entropy-LLE,PIE-LLE)。首先,为了使邻域点选择更加合理,从数据集的概率分布角度出发,考虑样本点及其邻域的概率分布,为样本点构造符合局部分布的邻域集合。其次,为了充分提取样本的局部结构信息,在权重构造阶段,分别计算样本所属邻域概率以及每个样本的信息熵,融合二者信息重构低维样本。最后,在两个轴承故障数据集上的实验表明,所提方法故障识别准确度最高达到了100%,高于其他对比算法;在邻域点个数5~15范围内,PIE-LLE算法展现出良好的低维可视化效果;在参数敏感性实验中,该算法可以保持Fisher指标较大,有效提高了算法的分类准确度和稳定性。

关键词: 局部线性嵌入算法, 概率模型, 信息熵, 特征提取, 故障诊断

Abstract: The local linear embedding algorithm uses Euclidean distance to select neighborhood points,which usually loses the nonlinear features of the dataset itself,resulting in incorrect selection of neighborhood points,and only using Euclidean distance to construct weights leads to insufficient information mining.To address the above issues,a local linear embedding algorithm based on probability model and information entropy(PIE-LLE algorithm) is proposed.Firstly,in order to make the selection of neighborhood points more reasonable,from the perspective of the probability distribution of the dataset,the probability distribution of the sample points and their neighborhoods is considered,and a neighborhood set that conforms to the local distribution is constructed for the sample points.Secondly,in order to fully extract the local structural information of the samples,in the weight construction stage,the probability of the neighborhood to which the samples belong and the information entropy of each sample are calculated separately,and the two information are fused to reconstruct the low dimensional samples.Finally,experiments on two bearing fault datasets show that the highest accuracy of fault identification reaches 100%,higher than that of other comparative algorithms.Within the range of 5~15 neighborhood points,the PIE-LLE algorithm exhibits good low dimensional visualization performance.In the parameter sensitivity experiment,the proposed algorithm can maintain a relatively large Fisher index,effectively improving the classification accuracy and stability of the algorithm.

Key words: Local linear embedding algorithm, Probability model, Information entropy, Feature extraction, Fault diagnosis

中图分类号: 

  • TP391
[1]SU Z,TANG B,MA J,et al.Fault diagnosis method based on incremental enhanced supervised locally linear embedding and adaptive nearest neighbor classifier[J].Measurement,2014,48:136-148.
[2]MA X,ZHANG F,LI Y,et al.Robust sparse representationbased face recognition in an adaptive weighted spatial pyramid structure[J].Science China Information Sciences,2018,61:1-13.
[3]HE X,CAI D,YAN S,et al.Neighborhood preserving embedding[C]//Tenth IEEE International Conference on Computer Vision(ICCV’05) Volume 1.IEEE,2005:1208-1213.
[4]HE X,YAN S,HU Y,et al.Face recognition using laplacianfaces[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(3):328-340.
[5]CAI D,HE X,HAN J,et al.Graph regularized nonnegative matrix factorization for data representation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,33(8):1548-1560.
[6]NIE F,WANG Z,WANG R,et al.Adaptive local embedding learning for semi-supervised dimensionality reduction[J].IEEE Transactions on Knowledge and Data Engineering,2021,34(10):4609-4621.
[7]WANG D,HOI S C H,HE Y,et al.Retrieval-based face annotation by weak label regularized local coordinate coding[C]//Proceedings of the 19th ACM International Conference on Multimedia.2011:353-362.
[8]FANG X,HAN N,WU J,et al.Approximate low-rank projection learning for feature extraction[J].IEEE Transactions on Neural Networks and Learning Systems,2018,29(11):5228-5241.
[9]TAN C,CHEN S,JI G,et al.A novel probabilistic label en-hancement algorithm for multi-label distribution learning[J].IEEE Transactions on Knowledge and Data Engineering,2021,34(11):5098-5113.
[10]HSIEH J W,CHUANG C H,CHEN S Y,et al.Segmentation of human body parts using deformable triangulation[J].IEEE Transactions on Systems,Man,and Cybernetics-Part A:Systems and Humans,2010,40(3):596-610.
[11]SAUL L K,ROWEIS S T.An introduction to locally linear embedding[EB/OL].http://www.cs.toronto.edu/~ roweis/lle/publications.html.
[12]WANG X,ZHENG Y,ZHAO Z,et al.Bearing fault diagnosisbased on statistical locally linear embedding[J].Sensors,2015,15(7):16225-16247.
[13]SOIZE C,GHANEM R.Data-driven probability concentrationand sampling on manifold[J].Journal of Computational Physics,2016,321:242-258.
[14]LIU Y,SHI B,LU S,et al.A novel local linear embedding algorithm via local mutual representation for bearing fault diagnosis[J].Reliability Engineering & System Safety,2024,247:110135.
[15]CAI B,HUANG L,XIE M.Bayesian Networks in Fault Diagnosis[J].IEEE Transactions on Industrial Informatics,2017,13(5):2227-2240.
[16]SOIZE C,GHANEM R.Probabilistic learning on manifolds[J].arXiv:2002.12653,2020.
[17]VERMA V,FERNANDEZ J,SIMMONSR,et al.Probabilisticmodels for monitoring and fault diagnosis[C]//The Second IARP and IEEE/RAS Joint Workshop on Technical Challenges for Dependable Robots in Human Environments.Ed.Raja Chatila.2002.
[18]CANDUCCI M,TINO P,MASTROPIETROM.Probabilisticmodelling of general noisy multi-manifold data sets[J].Artificial Intelligence,2022,302:103579.
[19]WANG Z,YAO L,CHEN G,et al.Modified multiscale weighted permutation entropy and optimized support vector machine method for rolling bearing fault diagnosis with complex signals[J].ISA Transactions,2021,114:470-484.
[20]SAAD F,CUSUMANO-TOWNER M,MANSINGHKA V.Estimators of entropy and information via inference in probabilistic models[C]//International Conference on Artificial Intelligence and Statistics.PMLR,2022:5604-5621.
[21]KUMAR A,PARKASH C,VASHISHTHAG,et al.State-space modeling and novel entropy-based health indicator for dynamic degradation monitoring of rolling element bearing[J].Reliability Engineering & System Safety,2022,221:108356.
[22]TANG F,FENG H,TINO P,et al.Probabilistic learning vector quantization on manifold of symmetric positive definite matrices[J].Neural Networks,2021,142:105-118.
[23]ZHAO L,ZHANG Z.Supervised locally linear embedding with probability-based distance for classification[J].Computers & Mathematics with Applications,2009,57(6):919-926.
[24]LIU Y H,ZHAO W B,ZHANG Y S,et al.Local linear embedding algorithms based on probability distributions[C]//2022 IEEE International Conference on Advances in Electrical Engineering and Computer Applications(AEECA).IEEE,2022:1336-1339.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!