Computer Science ›› 2019, Vol. 46 ›› Issue (9): 15-21.doi: 10.11896/j.issn.1002-137X.2019.09.002

Surverys

Survey of Semi-supervised Clustering

QIN Yue1, DING Shi-fei1,2   

  1. (School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China)1;
    (Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)2
  • Received:2018-09-13 Online:2019-09-15 Published:2019-09-02

Abstract: Semi-supervised clustering is a new learning method combining semi-supervised learning and clustering analysis,and it has been used widely in machine learning.The traditional unsupervised clustering algorithms do not need any data attributes when dividing data,but in practical applications,there are a small number of data samples for supervised information with independent class labels or paired constraints,so scholars are committed to applying these few supervised information into clustering to obtain better clustering results,thus proposing semi-supervised clustering.This paper mainly introduced the theoretical basis and algorithm ideas of semi-supervised clustering,and summarized the latest progress of semi-supervised clustering.Firstly,the current situation and classification of semi-supervised learning were reviewed,and the generative semi-supervised learning,semi-supervised SVM,semi-supervised learning based on graph and collaborative training were compared.Secondly,the clustering of semi-supervised learning was described in detail,four typical semi-supervised clustering algorithms (Cop-Kemans algorithm,LCop-Kmeans algorithm,Seeded-Kmeans algorithm and SC-Kmeans algorithm) were analyzed and summarized,and their advantages and disadvantages were eva-luated.Then,according to the two situations of semi-supervised clustering based on constraints and the semi-supervised clustering based on distance,the research status of semi-supervised clustering was expounded respectively.Finally,the applications of semi-supervised clustering in bioinformatics,image segmentation and other fields of computer and the future research directions were discussed.This paper aims to enable beginners to quickly know about the progress of semi-supervised clustering and understand the typical algorithm ideas,and it can play a guiding role in actual applications afterwards.

Key words: Clustering, Label, Machinelearning, Pairwise constraints, Semi-supervised clustering, Semi-supervised learning

CLC Number: 

  • TP181
