计算机科学 ›› 2014, Vol. 41 ›› Issue (9): 239-242.doi: 10.11896/j.issn.1002-137X.2014.09.045

• 人工智能 • 上一篇    下一篇

面向大数据的在线特征提取研究

许烁娜,曾碧卿,熊芳敏   

  1. 华南师范大学软件学院 佛山528225;华南师范大学软件学院 佛山528225;华南师范大学软件学院 佛山528225
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金(71272144),广州市科技计划项目(2013KP084),广东省自然科学基金项目(8151063101000040)资助

Big Data Oriented Online Feature Extraction

XU Shuo-na,ZENG Bi-qing and XIONG Fang-min   

  • Online:2018-11-14 Published:2018-11-14

摘要: 在大数据环境下,当利用机器学习算法对训练样本进行分类时,训练数据的高维度严重制约了分类算法的性能。文中应用L1准则的稀疏性,提出了一种在线特征提取算法,并用该算法对训练实例进行分类。利用公开数据集对算法的性能进行了分析,结果表明,提出的在线特征提取算法能准确地对训练实例进行分类,因而能更好地适用于大数据环境下的数据挖掘。

关键词: 大数据,机器学习,在线特征提取,算法

Abstract: In big data,the high dimension of training samples makes it difficult for classifying these samples during data mining.Applying the sparsity of the L1 norm,this paper proposed an online feature selection algorithm,and used this algorithm to classify the training samples.Experiments on public datasets show that the proposed online feature selection algorithm has better prediction accuracy than related work,and thus can be applicable to data mining for big data.

Key words: Big data,Machine learning,Online feature extraction,Algorithm

[1] Sivaram G,Hermansky H.Sparse multilayer perceptron for phoneme recognition[J].IEEE Transactions on Audio,Speech,and Language Processing,2012,20(1):23-29
[2] Ahmad M Y,Mohamed A,Yusof M,et al.Colorectal cancer image classification using image pre-processing and multilayer Perceptron[C]∥2012 International Conference on Computer & Information Science (ICCIS).IEEE,2012,1:275-280
[3] Filipovych R,Resnick S M,Davatzikos C.JointMMCC:JointMaximum-Margin Classification and Clustering of Imaging Data[J].IEEE Transactions on Medical Imaging,2012,31(5):1124-1140
[4] Singh A,Ahuja N,Moulin P.Online learning with kernels:Overcoming the growing sum problem[C]∥2012 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).IEEE,2012:1-6
[5] Ralaivola L.Confusion-Based Online Learning and a Passive-Aggressive Scheme[C]∥Advances in Neural Information Proces-sing Systems 25.2012:3293-3301
[6] Zhao P,Hoi S C,Jin R.Double updating online learning[J].Journal of Machine Learning Research,2011(12):1587-1615
[7] Wang J,Zhao P,Hoi S C.Exact soft confidence-weighted lear-ning[C]∥ICML.2012
[8] Crammer K,Dredze M,Pereira F.Confidence-weighted linearclassification for text categorization[J].The Journal of Machine Learning Research,2012,98888:1891-1926
[9] Wang J,Zhao P,Hoi S C H.Exact soft confidence-weightedlearning[C]∥Proceedings of the 29th International Conference on Machine Learinig(ICML-12).2012:121-128
[10] 王维,张鹏涛,谭营,等.一种基于人工免疫和代码相关性的计算机病毒特征提取方法[J].计算机学报,2011,34(2):204-215
[11] Ditzler G,Rosen G,Polikar R.Information theoretic feature selection for high dimensional metagenomic data[C]∥2012 IEEE International Workshop on Genomic Signal Processing and Statistics,(GENSIPS).IEEE,2012:143-146
[12] 王蕊,冯登国,杨轶,等.基于语义的恶意代码行为特征提取及检测方法[J].软件学报,2012,23(2):378-393
[13] Wu Y C,Yang J C.A Weighted Cluster-based Chinese TextCategorization Approach:Incorporating with Word Clusters[C]∥2012 IIAI International Conference on Advanced Applied Informatics (IIAIAAI).IEEE,2012:279-282
[14] Wang G,Song Q,Sun H,et al.A Feature Subset Selection Algorithm Automatic Recommendation Method[J].J.Artif.Intell.Res.(JAIR),2013,47:1-34
[15] Snelson E,Ghahramani Z.Variable noise and dimensionality reduction for sparse Gaussian processes[C]∥ Proceedings of the Twenty-Second Conference Annual Conference on Uncertaintly in Artificial Intelligence.2006:461-468
[16] Chandrasekaran V,Jordan M I.Computational and statisticaltradeoffs via convex relaxation[C]∥Proceedings of the National Academy of Sciences.2013
[17] 业巧林,赵春霞,陈小波.基于正则化技术的对支持向量机特征选择算法[J].计算机研究与发展,2011,48(6):1029-1037
[18] Bibicu D,Moraru L,Biswas A.Thyroid nodule recognition based on feature selection and pixel classification methods[J].Journal of Digital Imaging,2013,26(1):119-128
[19] Yu K,Ding W,Simovici D A,et al.Mining emerging patterns by streaming feature selection[C]∥Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2012:60-68
[20] Langford J,Li L,Zhang T.Sparse online learning via truncated gradient[J].The Journal of Machine Learning Research,2009(10):777-801
[21] Combettes P L,V B C.Variable metric forward-backwardsplitting with applications to monotone inclusions in duality[J].Optimization,2014,63(9):1289-1318
[22] Binev P,Dahmen W,DeVore R,et al.Compressed sensing andelectron microscopy[M]∥Modeling Nanoscale Imaging in Electron Microscopy.Springer US,2012:73-126
[23] Donoho,Leigh D.Compressed sensing[J].IEEE Transactions on Information Theory,2006,52(4):1289-1306
[24] 林颖.闭合序列模式的一种增量挖掘算法[J].重庆理工大学学报:自然科学版,2011,25(6):95-100

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!