Computer Science ›› 2019, Vol. 46 ›› Issue (6A): 423-426.

• Big Data & Data Mining • Previous Articles     Next Articles

Linear Discriminant Analysis of High-dimensional Data Using Random Matrix Theory

LIU Peng, YE Bin   

  1. School of Information and Control Engineering,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China
  • Online:2019-06-14 Published:2019-07-02

Abstract: Linear discriminant analysis (LDA) is an important theoretical and analytic tool for many machine learning and data mining tasks.As a parametric classification method,it performs well in many applications.However,LDA is impractical for high-dimensional data sets which are now routinely generated everywhere in modern society.A primary reason for the inefficiency of LDA for high-dimensional data is that the sample covariance matrix is no longer a good estimator of the population covariance matrix when the dimension of feature vector is close to or even larger than the sample size.Therefore,this paper proposed a high-dimensional data classifier regularization method based on random matrix theory.Firstly,a truly consistent estimation was conducted for high-dimensional covariance matrix through rotation invariance estimation and eigenvalue interception.Secondely,the estimated high-dimensional covariance matrix was used to calculate the discrimination function value.Numerical experiments on the artificial datasets,as well as some real world datasets such as the microarray datasets,demonstrate that the proposed discriminant analysis method has wider applications and yields higher accuracies than existing competitors.

Key words: Classification, Covariance matrix, High-dimensional data, Linear discriminant analysis, Random matrix theory

CLC Number: 

  • TP181
[1]霍中花,陈莹.采用增量式线性判别分析的行人再识别[J].小型微型计算机系统,2017,38(3):595-600.
[2]尹洪涛,付平,沙学军.基于DCT和线性判别分析的人脸识别[J].电子学报,2009,37(10):2211-2214.
[3]余建波,卢笑蕾,宗卫周.基于局部与非局部线性判别分析和高斯混合模型动态集成的晶圆表面缺陷探测与识别[J].自动化学报,2016,42(1):47-59.
[4]DUDOIT S,FRIDLYAND J,SPEED T P.Comparison of discrimination methods for the classification of tumors using gene expression data[J].Journal of the American Statistical Association,2002,97(457):77-87.
[5]蒋胜利.高维数据的特征选择与特征提取研究[D].西安:西安电子科技大学,2011.
[6]朱蔚恒,印鉴,邓玉辉,等.大数据环境下高维数据的快速重复检测方法[J].计算机研究与发展,2016,53(3):559-570.
[7]杨静,赵家石,张健沛.一种面向高维数据挖掘的隐私保护方法[J].电子学报,2013,41(11):2187-2192.
[8]白志东,郑术蓉,姜丹丹.大维统计分析[M].北京:高等教育出版社,2012:1-4.
[9]TREVOR H,ROBERT T,JEROME F.The elements of statistical learning [M].Springer,2009:106-117.
[10]FRIEDMAN J H.Regularized discriminant analysis[J].Journal of the American Statistical Association,1989,84(405):165-175.
[11]YE J,WANG T.Regularized discriminant analysis for high dimensional,low sample size data[C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2006:454-463.
[12]黄国宏,刘东峰.一种新的高维小样本情况下的线性判别分析[J].科学技术与工程,2008,8(10):2575-2578.
[13]崔振,山世光,陈熙霖.结构化稀疏线性判别分析[J].计算机研究与发展,2014,51(10):2295-2301.
[14]GORECKI T,LUCZAK M.Linear discriminant analysis with a generalization of the Moore-Penrose pseudoinverse[J].International Journal of Applied Mathematics and Computer Science,2013,23(2):463-471.
[15]BUN J,BOUCHAUD J P,POTTERS M.Cleaning large correlation matrices:tools from random matrix theory [J].Physics Reports,2017,666:1-109.
[16]BAI J,SHI S.Estimating high dimensional covariance matrices and its applications [J].Annals of Economics and Finance,2011,12(2):199-215.
[17]王磊,郑宝玉,李雷.基于随机矩阵理论的协作频谱感知[J].电子与信息学报,2009,31(8):1925-1929.
[18]韩华,吴翎燕,宋宁宁.基于随机矩阵的金融网络模型[J].物理学报,2014,63(13):138901.
[19]许帅.复杂网络的随机矩阵理论分析[D].徐州:中国矿业大学,2014.
[20]BUN J,ALLEZ R,BOUCHAUD J P.Rotational invariant estimator for general noisy matrices[J].IEEE Transactions on Information Theory,2016,62(12):7475-7490.
[21]EDELMAN A,RAO N R.Random matrix theory[J].ActaNumerica,2005,14:233-297.
[22]SRIVASTAVA M S,KUBOKAWA T.Comparison of discrimination methods for high dimensional data[J].Journal of the Japan Statistical Society,2007,37(1):123-134.
[23]TONG T,CHEN L,ZHAO H.Improved mean estimation and its application to diagonal discriminant analysis[J].Bioinformatics,2012,28(4):531-537.
[24]GUO Y,HASTIE T,TIBSHIRANI R.Regularized linear discriminant analysis and its application in microarrays[J].Biostatistics,2007,8(1):86-100.
[25]Interdisciplinary Computing and Complex BioSystems (ICOS) Research Group [EB/OL].http://ico2s.org/datasets/microarray.html.
[26]Gene Expression Model Selector [EB/OL].http://www.gems-system.org.
[1] CHEN Zhi-qiang, HAN Meng, LI Mu-hang, WU Hong-xin, ZHANG Xi-long. Survey of Concept Drift Handling Methods in Data Streams [J]. Computer Science, 2022, 49(9): 14-32.
[2] ZHOU Xu, QIAN Sheng-sheng, LI Zhang-ming, FANG Quan, XU Chang-sheng. Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification [J]. Computer Science, 2022, 49(9): 132-138.
[3] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[4] TAN Ying-ying, WANG Jun-li, ZHANG Chao-bo. Review of Text Classification Methods Based on Graph Convolutional Network [J]. Computer Science, 2022, 49(8): 205-216.
[5] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[6] WU Hong-xin, HAN Meng, CHEN Zhi-qiang, ZHANG Xi-long, LI Mu-hang. Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning [J]. Computer Science, 2022, 49(8): 12-25.
[7] GAO Zhen-zhuo, WANG Zhi-hai, LIU Hai-yang. Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features [J]. Computer Science, 2022, 49(7): 40-49.
[8] YANG Bing-xin, GUO Yan-rong, HAO Shi-jie, Hong Ri-chang. Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition [J]. Computer Science, 2022, 49(7): 57-63.
[9] ZHANG Hong-bo, DONG Li-jia, PAN Yu-biao, HSIAO Tsung-chih, ZHANG Hui-zhen, DU Ji-xiang. Survey on Action Quality Assessment Methods in Video Understanding [J]. Computer Science, 2022, 49(7): 79-88.
[10] SHAO Xin-xin. TI-FastText Automatic Goods Classification Algorithm [J]. Computer Science, 2022, 49(6A): 206-210.
[11] CHEN Jing-nian. Acceleration of SVM for Multi-class Classification [J]. Computer Science, 2022, 49(6A): 297-300.
[12] YANG Jian-nan, ZHANG Fan. Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure [J]. Computer Science, 2022, 49(6A): 353-357.
[13] YANG Han, WAN You, CAI Jie-xuan, FANG Ming-yu, WU Zhuo-chao, JIN Yang, QIAN Wei-xing. Pedestrian Navigation Method Based on Virtual Inertial Measurement Unit Assisted by GaitClassification [J]. Computer Science, 2022, 49(6A): 759-763.
[14] HUANG Pu, SHEN Yang-yang, DU Xu-ran, YANG Zhang-jing. Face Recognition Based on Locality Constrained Feature Line Representation [J]. Computer Science, 2022, 49(6A): 429-433.
[15] DU Li-jun, TANG Xi-lu, ZHOU Jiao, CHEN Yu-lan, CHENG Jian. Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning [J]. Computer Science, 2022, 49(6A): 60-65.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!