计算机科学 ›› 2011, Vol. 38 ›› Issue (2): 171-174.

• 数据库与数据挖掘 • 上一篇    下一篇

异构数据的结构嫡聚类算法

李志华,顾言,陈孟涛,王士同,陈秀宏   

  1. (江南大学信息工程学院 无锡214122)(江南大学数字传媒学院 无锡214122)
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家白然科学基金青年科学基金项目(60704047),2007年度教育部高等学校创新工程重大培育项目资助。

Structure-based Entropy Clustering Algorithm for Heterogeneous Data

LI Zhi-hua,GU Yan,CHEN Meng-tao,WANG Shi-tong,CHEN Xiu-hong   

  • Online:2018-11-16 Published:2018-11-16

摘要: 研究了语义数据的聚类问题,提出了一种基于样本内在结构的结构嫡聚类SEC算法。通过给出语义属性相异性度量测度的新定义,挖掘蕴含于数据样本中的结构信息,提出了一种根据结构信息计算样本信息嫡的优化方法,即通过嫡来确定样本的聚类中心,从而完成样本的聚类,并把此方法向异构数据进行了拓展。SEC算法能实现不平衡数据的聚类,能自动确定初始类中心和聚类数目,具有无需迭代、效率高和相当的鲁棒性优势。实验表明,算法是有效的,与文献中的已有方法相比,聚类准确率得到显著提高,具有一定的实用价值。

关键词: 异构数据,相异性度量,聚类线索,结构嫡,聚类算法

Abstract: The dissimilarity measure and clustering approach about the heterogeneous dataset were studied, and a struclure-based entropy clustering SEC algorithm was presented in this paper. Data often do appear in homogeneous groups,the SEC utilizes these structural information to improve the clustering accuracy. Unlike the distribution of numeric data,nominal data are often unbalancedly distributed,whose distribution are often unrelated with their distance measure,due to the above, a new structural information-based entropy computing technology was proposed. By mining the clues in structural information, constructing the weight implying the different distribution information of nominal and numeric attributes, the SEC can automatically identifies the initial locations and number of cluster centriods, and exhibits its robustness to initialization and no iteration in algorithm. Experimental results comparing with other references demonstrate that the proposed method has promising performance.

Key words: Heterogeneous data, Dissimilarity measure, Clustering clue, Structural entropy, Clustering algorithm

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!