计算机科学 ›› 2011, Vol. 38 ›› Issue (7): 235-239.

• 人工智能 • 上一篇    下一篇

基于EMD距离的多示例聚类

李展,彭进业,温超   

  1. (西北大学信息科学与技术学院 西安710069);(西北工业大学电子信息学院 西安710072)
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受教育部新世纪优秀人才支持计划项目(NCET-07-0693),陕西省教育厅科研项目(10JK852)资助.

Multi-instance Clustering Based on EMD

LI Zhan,PENG Jin-ye,WHEN Chao   

  • Online:2018-11-16 Published:2018-11-16

摘要: 多示例学习中,包由多个示例组成,有明确标记,而示例标记却不确定。已有聚类研究都针对单示例、单标记,因而无法直接应用于多示例问题。基于推土机距离(earth mover's distance, EMD)提出了一种新的多示例聚类算法ECMIL。该方法首先利用欧式距离计算包内示例相似度,将相似示例合并;然后将需要度量距离相似性的包内示例分别看作供货者和消费者,计算货物拥有量和货物需求量;对推土机距离无法供货问题,通过增大满足条件供货者的权值加以解决;最后使用k-mcdoids算法进行聚类。在基准数据集MUSK, Corcl和SIVAI上进行实验,表明EC-MIL算法是有效的。

关键词: 多示例聚类,推土机距离,k-medoids

Abstract: In the setting of multi-instance learning, each sample is represented by a bag composed of multiple instances.Previous studies on clustering mainly deal with the single instance in traditional learning setting, so it can't be applied to multi instance problem directly. In this paper, based on earth mover's distance, a novel multiplcinstance clustering algothrim named ECMKIL was presented. Firstly we calculated the bag's instances' similarity, emerged the similarity ones, then regarded the two bags' instances as suppliers and consumers, calculated the goods and capacity. To deal with the supplier-consumer imbalance problem, we solved it by multiplying the goods. Finally, used k-medoids to cluster the multi-instance data. Experimental results on MUSK, Corel and SIVAL data set indicate that the ECMKIL method is effective.

Key words: Multi instance clustering,Earth mover's distance,K-medoids

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!