Computer Science ›› 2012, Vol. 39 ›› Issue (5): 190-194.
Previous Articles Next Articles
Online:
Published:
Abstract: In recent years, applying DNA microarray technology to diagnose for disease, especially for cancer, has been becoming one of hot topics in bioinformatics. In contrast with many other data carriers,microarray data generally holds some unique characteristics. A novel oversampling technology based on probability distribution was proposed to solve the problem brought by the characteristic of sample distribution imbalance of microarray data. 13y this technology, some reasonable pseudo samples would be created for the minority class to guarantee the balance between two classes. Then we used random forest to classify the samples belonging to different classes. Its effectiveness and feasibility were verified on two benchmark microarray datasets. Experimental results show that the proposed method can obtain better classification performance, compared with some traditional approaches.
Key words: Microarray data, Sample distribution imbalance, Oversampling technology, Probability distribution, Random forest
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: https://www.jsjkx.com/EN/
https://www.jsjkx.com/EN/Y2012/V39/I5/190
Cited