计算机科学 ›› 2012, Vol. 39 ›› Issue (7): 144-147.

• 数据库与数据挖掘 • 上一篇    下一篇

通过评估示例中概念的重要性来解决多示例学习问题

甘睿,印鉴   

  1. (中山大学信息科学与技术学院 广州510006)
  • 出版日期:2018-11-16 发布日期:2018-11-16

Solving Multi-instance Learning Problem with Evaluating the Importance of Concept in Instances

  • Online:2018-11-16 Published:2018-11-16

摘要: 在多示例学习问题中,训练数据集里面的每一个带标记的样本都是由多个示例组成的包,其最终目的是利用 这一数据集去训练一个分类器,使得可以利用该分类器去预测还没有被标记的包。在以往的关于多示例学习问题的 研究中,有的是通过修改现有的单示例学习算法来迎合多示例的需要,有的则是通过提出新的方法来挖掘示例与包之 间的关系并利用挖掘的结果来解决问题。以改变包的表现形式为出发点,提出了一个解决多示例学习问题的算 法—概念评估算法。该算法首先利用聚类算法将所有示例聚成d簇,每一个簇可以看作是包含在示例中的概念;然 后利用原本用于文本检索的I}F-IDF( I}crm Frequency-Inverse Document Frequency)算法来评估出每一个概念在每个 包中的重要性;最后将包表示成一个d维向量—概念评估向量,其第i个位置表示第i个簇所代表的概念在某个包 中的重要程度。经重新表示后,原有的多示例数据集已不再是“多示例”,以至于一些现有的单示例学习算法能够用来 高效地解决多示例学习问题。

关键词: 多示例学习,重新表示,单示例学习,概念评估

Abstract: In multi-instance learning, the training set is composed of labeled bags, each of which consists of many unla- beled instances,and the goal is to learn some classifier from the training set for correctly labeling unseen bags. In the past, some researches about multi-instance learning aim at improving single-instance learning algorithms to meet the multi-instance representation,and others try to propose some new methods to find the relationship between instances and bags and use the result to solve the problem. This paper started from adapting the representation of the bag and proposed a new algorithm-concept evaluating algorithm. First, this algorithm uses a cluster algorithm to cluster all instances into d group,here each group can be treated as a concept in the instances. Then,it uses the TF-IDF (term fre- qucncy-inverse document frequcncy)algorithm to get the importance of each concept in the bag. Finally, each bag is re represented as a d dimensional vector}concept evaluating vector, the ith value in this vector is the importance of the ith group in the bag. Because after re-representing the data set is not "multi" again, some propositional singl}instance learning algorithms can be used to solve multi-instance learning problem effetely.

Key words: Multi-instance learning, Rcrcprcscnt, Singlcinstancc learning, Concept evaluating

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!