计算机科学 ›› 2021, Vol. 48 ›› Issue (3): 214-219.doi: 10.11896/jsjkx.191200103

• 人工智能 • 上一篇    下一篇

基于加权样本和共识率的标记传播算法

储杰, 张正军, 汤鑫瑶, 黄振生   

  1. 南京理工大学理学院 南京210094
  • 收稿日期:2019-12-16 修回日期:2020-04-29 出版日期:2021-03-15 发布日期:2021-03-05
  • 通讯作者: 张正军(zzjnj@163.com)
  • 作者简介:jiecnj@163.com
  • 基金资助:
    全国统计科学研究重大项目(2018LD01)

Label Propagation Algorithm Based on Weighted Samples and Consensus-rate

CHU Jie, ZHANG Zheng-jun, TANG Xin-yao, HUANG Zhen-sheng   

  1. School of Science,Nanjing University of Science and Technology,Nanjing 210094,China
  • Received:2019-12-16 Revised:2020-04-29 Online:2021-03-15 Published:2021-03-05
  • About author:CHU Jie,born in 1996,postgraduate.His main research interests include data mining and machine learning.
    ZHANG Zheng-jun,born in 1965,Ph.D,associate professor.His main research interests include data mining and gra-phic image.
  • Supported by:
    National Statistical Science Research Major Program of China(2018LD01).

摘要: 标记传播是使用最广泛的半监督分类方法之一。基于共识率的标记传播算法(Consensus Rate-based Label Propagation,CRLP)通过汇总多个聚类方法以合并数据各种属性得到的共识率来构造图。然而,CRLP算法与大多数基于图的半监督分类方法一样,在图中将每个标记样本视为同等重要,它们主要通过优化图的结构来提高算法的性能。事实上,样本不一定是均匀分布的,不同的样本在算法中的重要性也是不同的,并且CRLP算法容易受聚类数目和聚类方法的影响,对低维数据的适应性不足。针对这些问题,文中提出了一种基于加权样本和共识率的标记传播算法(Label Propagation Algorithm Based on Weighted Samples and Consensus-Rate,WSCRLP)。WSCRLP算法首先对数据集进行多次聚类,以探索样本的结构,并结合共识率和样本的局部信息构造图;然后为不同分布的标记样本分配不同的权重;最后基于构造的图和加权样本进行半监督分类。在真实数据集上的实验表明,WSCRLP算法对标记样本进行加权和构造图的方法可以显著提高分类准确率,在84%的实验中都优于对比方法。相比CRLP算法,WSCRLP算法不仅具有更好的性能,而且对输入参数具有鲁棒性。

关键词: 半监督分类, 标记传播, 共识率, 加权样本

Abstract: Label Propagation is one of the most widely used semi-supervised classification methods.Consensus rate-based label propagation(CRLP) algorithmconstructs the graph by summarizing multiple clustering solutions to incorporate various properties of the data.Like most graph-based semi-supervised classification method,CRLP focuses on optimizing the graph to improve the performance.In fact,samples are not always evenly distributed.The importance of different samples in the algorithm is diffe-rent.CRLP algorithm is easily affected by the numbers of clustering and the clustering methods,and it is not adaptable to low-dimensional data.To deal with these problems,a label propagation algorithm based on weighted samples and consensus-rate (WSCRLP) is proposed.WSCRLP firstly clusters the dataset multiple times to explore the structure of sample and combines the consensus-rate and the local information of the sample to construct a graph.Secondly,different weights are assigned to labeled samples with different distributions.Finally,semi-supervised classification is performed based on constructed graph and weighted samples.Experiments on real datasets show that the WSCRLP of weighting and constructing graphs on labeled samples can significantly improve classification accuracy,and is superior to other compared methods in 84% of the experiments.Compared with CRLP,WSCRLP not only has better performance,but also is robust to input parameters.

Key words: Consensus-rate, Label propagation, Semi-supervised classification, Weighted samples

中图分类号: 

  • TP301.6
[1]BLUM A,CHAWLA S.Learning from labeled and unlabeleddata using graph mincuts[C]//Processdings of 18th Internatio-nal Conference on Machine Learning.San Francisco:Morgan Kaufman Publishers Inc,2001:19-26.
[2]LI J N,ZHU Q S.Semi-Supervised self-training method based on an optimum-path forest[J].IEEE Access,2019,7:36388-36399.
[3]GAO Y,MA J,ALAN L,et al.Semi-Supervised sparse representation based classification for face recognition with insufficient labeled samples[J].IEEE Transactions Image Processing,2017,26(5):2545-2560.
[4]BELKIN M,NIYOGI P,SINDHWANI V.Manifold Regularization:A geometric framework for learning from labeled and unlabeled examples[J].The Journal of Machine Learning Research,2006,7:2399-2434.
[5]ZHU X,GHAHRAMANI Z,LAFFERTYJ D.Semi-supervised learning using Gaussian fields and harmonic functions[C]//Processdings of the Twentieth International Conference on Machine Learning.Washington:AAAI Press,2003:912-919.
[6]TAO G H,HUA L Z,WU W,et al.Safety-aware graph-based semi-supervised learning[J].Expert Systems with Applications,2018,107:243-254.
[7]WANG J,YAO G J,YU Z W.Semi-supervised classification by discriminative regularization[J].Applied Soft Computing,2017,58:245-255.
[8]NIGAM K,MCCALLLUM A K,et al.Text classification from labeled and unlabeled documents using EM[J].Machine Lear-ning,2000,39:103-134.
[9]WANG S,WU L,JIAO L,et al.Improve the performance of co-training by committee with refinement of class probability estimations[J].Neurocomputing,2014,136:30-40.
[10]HONG Y,ZHU W.Spatial co-training for semi-supervised ima-ge classification[J].Pattern Recognition Letters,2015,63:59-65.
[11]LI Y C,WANG Y L,BI C,et al.Revisiting transductive support vector machines with margin distribution embedding[J].Know-ledge-based Systems,2018,152:200-214.
[12]JURIC L,CECI M,KOCEV D,et al.Self-training for multi-target regression with tree ensembles[J].Knowledge-based Systems,2017,123:41-60.
[13]ZHOU D,BOUSQUET O,LAL T N,et al.Learning with local and global consistency[C]//Proceedings of the Sixteenth Advance in Neural Information Processing Systems.Whistler:MIT Press,2003:321-328.
[14]WANG F,ZHANG C.Label propagation through linear neighborhoods[J].IEEE Transactions on Knowledge and Data Engineering,2008,20(1):55-67.
[15]ZHAO M,CHOW T W S,ZHANG Z,et al.Automatic image annotation via compact graph based semi-supervised learning[J].Knowledge-Based Systems,2015,76:148-165.
[16]WU F,WANG W,YANG Y,et al.Classification by semi-supervised discriminative regularization[J].Neurcomputing,2010,73(10):1641-1651.
[17]YU J,SB K.Consensus rate-based label propagation for semi-supervised classification[J].Information Sciences,2018,465:265-284.
[18]DAS S,MOORE T,WONG W K,et al.End-user feature labeling:Supervised and semi-supervised approaches based on locally-weighted logistic regression[J].Artificial Intelligence,2013,204:56-74.
[19]REN Y,DOMENICONI C,ZHANG G,et al.Weighted-object ensemble clustering[C]//IEEE International Conference on Data Mining.Dallas:IEEE Press,2013:627-636.
[20]CHEN X,YU G X,TAN Q Y,et al.Weighted samples based semi-supervised classification[J].Applied Soft Computing,2019,79:46-58.
[1] 庞兴龙, 朱国胜.
基于半监督学习的网络流量分析研究
Survey of Network Traffic Analysis Based on Semi Supervised Learning
计算机科学, 2022, 49(6A): 544-554. https://doi.org/10.11896/jsjkx.210600131
[2] 王省, 康昭.
基于光滑表示的半监督分类算法
Smooth Representation-based Semi-supervised Classification
计算机科学, 2021, 48(3): 124-129. https://doi.org/10.11896/jsjkx.200700078
[3] 杨格兰,金辉霞,孟令中,朱幸辉.
基于图的半监督降维算法
Graph-based Semi-supervised Dimensionality Reduction Algorithm
计算机科学, 2014, 41(4): 280-282.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!