计算机科学 ›› 2010, Vol. 37 ›› Issue (7): 205-207.

• 人工智能 • 上一篇    下一篇

一种用于处理高维稀疏数据的半监督聚类算法

崔鹏,张汝波   

  1. (哈尔滨工程大学计算机与技术学院 哈尔滨150001);(哈尔滨理工大学计算机与技术学院 哈尔滨150080)
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受863国家重点基金项目(2009AA04Z215)资助。

Novel Semi-supervised Clustering for High Dimensional Data

CUI Peng,ZHANG Ru-bo   

  • Online:2018-12-01 Published:2018-12-01

摘要: 半监督聚类是近年来研究的热点,传统的方法是在无监督算法的基础上加入有限的背景知识来提高聚类性能。然而大多数半监督聚类技术都基于部近或密度,难以处理高维数据,因此必须将约减的特征加入到半监督聚类过程中。为解决此问题,提出了一种新的半监督聚类算法框架。该算法利用样本约束传递性进行预处理,然后将特征投影到低维空间实现降维,最终用半监督算法对约减后的样本进行聚类。通过实验同现行主要降维方法进行了比较,说明此方法能有效地处理高维数据,聚类效果良好。

关键词: 降维,半监督聚类,特征选择,约束

Abstract: Semi-supervised clustering is a popular clustering method in recent year, which usually incorporates limited background knowledge to improve the clustering performance. However, most of existing methods based on neighbors or density can't be used for processing high dimensionality data. So it is critical of merging the reduced feature with semi-supervised clustering process. ho solve the problem, we proposed a framework for semi-supervised clustering. The framework firstly preprocesses instances with transmissibility of constraints;then reduced dimensionality by projecting feature into low dimensional space;finally it clustered instances with reduced features. To evaluate the effectiveness of the method, we implemented experiments on datasets, the results show the method has good clustering performance for handling data of high dimension.

Key words: Dimensionality reduction, Semi-supervised clustering, Feature selection, Constraints

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!