计算机科学 ›› 2012, Vol. 39 ›› Issue (5): 190-194.

• 人工智能 • 上一篇    下一篇

基于过采样技术和随机森林的不平衡微阵列数据分类方法研究

于化龙,高尚,赵靖,秦斌   

  1. (江苏科技大学计算机科学与工程学院 镇江 212003) (哈尔滨工程大学计算机科学与技术学院 哈尔滨 150001)
  • 出版日期:2018-11-16 发布日期:2018-11-16

Classification for Imbalanced Microarray Data Based on Oversampling Technology and Random Forest

  • Online:2018-11-16 Published:2018-11-16

摘要: 近年来,应用DNA微阵列技术对疾病,尤其是癌症进行诊断,已逐渐成为生物信息学领域的研究热点之一。对比其它的数据载体,微阵列数据通常具有一些独有的特点。针对微阵列数据样本分布不平衡这一特点,提出了一种基于概率分布的过采样技术,通过该技术可以为少数类建立一些合理的伪样本,从而使各类的样本数达到均衡,然后使用随机森林分类器对其进行分类。该方法的有效性和可行性已经在两个标准的微阵列数据集上得到了验证。实验结果显示,与传统的方法相比,该方法可以获得更好的分类性能。

关键词: 微阵列数据,样本分布不平衡,过采样技术,概率分布,随机森林

Abstract: In recent years, applying DNA microarray technology to diagnose for disease, especially for cancer, has been becoming one of hot topics in bioinformatics. In contrast with many other data carriers,microarray data generally holds some unique characteristics. A novel oversampling technology based on probability distribution was proposed to solve the problem brought by the characteristic of sample distribution imbalance of microarray data. 13y this technology, some reasonable pseudo samples would be created for the minority class to guarantee the balance between two classes. Then we used random forest to classify the samples belonging to different classes. Its effectiveness and feasibility were verified on two benchmark microarray datasets. Experimental results show that the proposed method can obtain better classification performance, compared with some traditional approaches.

Key words: Microarray data, Sample distribution imbalance, Oversampling technology, Probability distribution, Random forest

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!