Computer Science ›› 2019, Vol. 46 ›› Issue (6A): 423-426.

• Big Data & Data Mining • Previous Articles     Next Articles

Linear Discriminant Analysis of High-dimensional Data Using Random Matrix Theory

LIU Peng, YE Bin   

  1. School of Information and Control Engineering,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China
  • Online:2019-06-14 Published:2019-07-02

Abstract: Linear discriminant analysis (LDA) is an important theoretical and analytic tool for many machine learning and data mining tasks.As a parametric classification method,it performs well in many applications.However,LDA is impractical for high-dimensional data sets which are now routinely generated everywhere in modern society.A primary reason for the inefficiency of LDA for high-dimensional data is that the sample covariance matrix is no longer a good estimator of the population covariance matrix when the dimension of feature vector is close to or even larger than the sample size.Therefore,this paper proposed a high-dimensional data classifier regularization method based on random matrix theory.Firstly,a truly consistent estimation was conducted for high-dimensional covariance matrix through rotation invariance estimation and eigenvalue interception.Secondely,the estimated high-dimensional covariance matrix was used to calculate the discrimination function value.Numerical experiments on the artificial datasets,as well as some real world datasets such as the microarray datasets,demonstrate that the proposed discriminant analysis method has wider applications and yields higher accuracies than existing competitors.

Key words: Classification, Covariance matrix, High-dimensional data, Linear discriminant analysis, Random matrix theory

CLC Number: 

  • TP181
[4]DUDOIT S,FRIDLYAND J,SPEED T P.Comparison of discrimination methods for the classification of tumors using gene expression data[J].Journal of the American Statistical Association,2002,97(457):77-87.
[9]TREVOR H,ROBERT T,JEROME F.The elements of statistical learning [M].Springer,2009:106-117.
[10]FRIEDMAN J H.Regularized discriminant analysis[J].Journal of the American Statistical Association,1989,84(405):165-175.
[11]YE J,WANG T.Regularized discriminant analysis for high dimensional,low sample size data[C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2006:454-463.
[14]GORECKI T,LUCZAK M.Linear discriminant analysis with a generalization of the Moore-Penrose pseudoinverse[J].International Journal of Applied Mathematics and Computer Science,2013,23(2):463-471.
[15]BUN J,BOUCHAUD J P,POTTERS M.Cleaning large correlation matrices:tools from random matrix theory [J].Physics Reports,2017,666:1-109.
[16]BAI J,SHI S.Estimating high dimensional covariance matrices and its applications [J].Annals of Economics and Finance,2011,12(2):199-215.
[20]BUN J,ALLEZ R,BOUCHAUD J P.Rotational invariant estimator for general noisy matrices[J].IEEE Transactions on Information Theory,2016,62(12):7475-7490.
[21]EDELMAN A,RAO N R.Random matrix theory[J].ActaNumerica,2005,14:233-297.
[22]SRIVASTAVA M S,KUBOKAWA T.Comparison of discrimination methods for high dimensional data[J].Journal of the Japan Statistical Society,2007,37(1):123-134.
[23]TONG T,CHEN L,ZHAO H.Improved mean estimation and its application to diagonal discriminant analysis[J].Bioinformatics,2012,28(4):531-537.
[24]GUO Y,HASTIE T,TIBSHIRANI R.Regularized linear discriminant analysis and its application in microarrays[J].Biostatistics,2007,8(1):86-100.
[25]Interdisciplinary Computing and Complex BioSystems (ICOS) Research Group [EB/OL].
[26]Gene Expression Model Selector [EB/OL].
[1] CHEN Zhi-qiang, HAN Meng, LI Mu-hang, WU Hong-xin, ZHANG Xi-long. Survey of Concept Drift Handling Methods in Data Streams [J]. Computer Science, 2022, 49(9): 14-32.
[2] ZHOU Xu, QIAN Sheng-sheng, LI Zhang-ming, FANG Quan, XU Chang-sheng. Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification [J]. Computer Science, 2022, 49(9): 132-138.
[3] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[4] TAN Ying-ying, WANG Jun-li, ZHANG Chao-bo. Review of Text Classification Methods Based on Graph Convolutional Network [J]. Computer Science, 2022, 49(8): 205-216.
[5] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[6] WU Hong-xin, HAN Meng, CHEN Zhi-qiang, ZHANG Xi-long, LI Mu-hang. Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning [J]. Computer Science, 2022, 49(8): 12-25.
[7] GAO Zhen-zhuo, WANG Zhi-hai, LIU Hai-yang. Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features [J]. Computer Science, 2022, 49(7): 40-49.
[8] YANG Bing-xin, GUO Yan-rong, HAO Shi-jie, Hong Ri-chang. Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition [J]. Computer Science, 2022, 49(7): 57-63.
[9] ZHANG Hong-bo, DONG Li-jia, PAN Yu-biao, HSIAO Tsung-chih, ZHANG Hui-zhen, DU Ji-xiang. Survey on Action Quality Assessment Methods in Video Understanding [J]. Computer Science, 2022, 49(7): 79-88.
[10] SHAO Xin-xin. TI-FastText Automatic Goods Classification Algorithm [J]. Computer Science, 2022, 49(6A): 206-210.
[11] CHEN Jing-nian. Acceleration of SVM for Multi-class Classification [J]. Computer Science, 2022, 49(6A): 297-300.
[12] YANG Jian-nan, ZHANG Fan. Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure [J]. Computer Science, 2022, 49(6A): 353-357.
[13] YANG Han, WAN You, CAI Jie-xuan, FANG Ming-yu, WU Zhuo-chao, JIN Yang, QIAN Wei-xing. Pedestrian Navigation Method Based on Virtual Inertial Measurement Unit Assisted by GaitClassification [J]. Computer Science, 2022, 49(6A): 759-763.
[14] HUANG Pu, SHEN Yang-yang, DU Xu-ran, YANG Zhang-jing. Face Recognition Based on Locality Constrained Feature Line Representation [J]. Computer Science, 2022, 49(6A): 429-433.
[15] DU Li-jun, TANG Xi-lu, ZHOU Jiao, CHEN Yu-lan, CHENG Jian. Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning [J]. Computer Science, 2022, 49(6A): 60-65.
Full text



No Suggested Reading articles found!