Computer Science ›› 2015, Vol. 42 ›› Issue (Z6): 453-458.

Previous Articles     Next Articles

Feature Extraction Method Based on Sparse Principal Components for Gene Expression Data

SHEN Ning-min, LI Jing, ZHOU Pei-yun and ZHUANG Yi   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Cluster analysis is a popular method for gene expression data,which can be used for finding cancer cell so that the diseases can be diagnosed accurately and rapidly through the gene class label.However,more attributes and less samples will produce a mass of redundant or disturbed information,resulting in the decline of the accuracy of the direct clustering in high dimensional data.Principal Component Analysis(PCA) is a classical method for dimension reduction which can transform high dimension data into low space under maintaining maximal variance.The shortcoming of PCA is the lack of strong interpretation for loadings that have no characteristic of sparsity.In this paper,a sparse PCA methodbased on Truncated Power was applied into the feature extraction for gene expression data,then the sparse PCA was fed into K-means process for clustering.Finally,the experimental results on Colon cancer,leukemia and lurg cancer three typical gene datasets verify that the sparse gene data can improve the efficiency and accuracy on clustering.

Key words: Gene expression data,Loadings,Truncated power,Sparse principal component analysis,Feature extraction

[1] Khobragade V P,Vinayababu A.A Classification of Microarray Gene Expression Data Using Hybrid Soft Computing Approach[J].International Journal of Computer Science Issues(IJCSI),2012,9(6)
[2] Bi X,Huang H,Matis-Mitchell S,et al.Building a classifier for identifying sentences pertaining to disease-drug relationships in tardive dyskinesia[C]∥2012 IEEE International Conference on Bioinformatics and Biomedicine(BIBM).IEEE,2012:1-4
[3] Zhou X,Liu K Y,Wong S T C.Cancer classification and prediction using logistic regression with Bayesian gene selection[J].Journal of Biomedical Informatics,2004,37(4):249-259
[4] Atallah R,Ryan J,Aeschlimann D.Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data[C]∥CS 229:Machine Learning Final Projeecs,Autumn 2013.2013
[5] Abraham G,Inouye M.Fast Principal Component Analysis of Large-Scale Genome-Wide Data[J].PloS one,2014,9(4):e93766
[6] Natarajan N,Dhillon I S.Inductive matrix completion for predicting gene-disease associations[J].Bioinformatics,2014,30(12):i60-i68
[7] Hyvrinen A,Karhunen J,Oja E.Independent component analy-sis[M].John Wiley & Sons,2004
[8] Huang D S,Zheng C H.Independent component analysis-based penalized discriminant method for tumor classification using gene expression data[J].Bioinformatics,2006,22(15):1855-1862
[9] Liebermeister W.Linear modes of gene expression determinedby independent component analysis[J].Bioinformatics,2002,18(1):51-60
[10] Smith L I.A tutorial on principal components analysis[D].Cornell University,USA,2002,51:52
[11] Jolliffe I.Principal component analysis[M].John Wiley & Sons,Ltd,2005
[12] Misra J,Schmitt W,Hwang D,et al.Interactive exploration of microarray gene expression patterns in a reduced dimensional space[J].Genome research,2002,12(7):1112-1120
[13] Zou H,Hastie T,Tibshirani R.Sparse principal component ana-lysis[J].Journal of computational and graphical statistics,2006,15(2):265-286
[14] d’Aspremont A,El Ghaoui L,Jordan M I,et al.A direct formulation for sparse PCA using semidefinite programming[J].SIAM review,2007,49(3):434-448
[15] Journée M,Nesterov Y,Richtárik P,et al.Generalized power method for sparse principal component analysis[J].The Journal of Machine Learning Research,2010,11:517-553
[16] Yuan X T,Zhang T.Truncated power method for sparse eigenvalue problems[J].The Journal of Machine Learning Research,2013,14(1):899-925
[17] Saad Y.Numerical methods for large eigenvalue problems[M].Manchester:Manchester University Press,1992
[18] Mackey L W.Deflation methods for sparse pca[C]∥Advances in Neural Information Processing Systems.2009:1017-1024
[19] Cadima J,Jolliffe I T.Loading and correlations in the interpretation of principle compenents[J].Journal of Applied Statistics,1995,22(2):203-214
[20] Vines S K.Simple principal components[J].Journal of the Royal Statistical Society:Series C(Applied Statistics),2000,49(4):441-451
[21] Jolliffe I T,Trendafilov N T,Uddin M.A modified principal component technique based on the LASSO[J].Journal of Computational and Graphical Statistics,2003,12(3):531-547

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!