计算机科学 ›› 2022, Vol. 49 ›› Issue (7): 73-78.doi: 10.11896/jsjkx.210500092
胡艳羽, 赵龙, 董祥军
HU Yan-yu, ZHAO Long, DONG Xiang-jun
摘要: 癌症是世界上最致命的疾病之一。利用机器学习处理基因微阵列数据集(Microarray Data)对于协助癌症的早期诊断具有重要作用,但微阵列数据集中基因特征的数目远大于样本数目,造成样本不平衡,影响了分类的效率和精度,因此对基因阵列数据进行特征选择就显得尤为重要。现有的特征选择算法多为单一条件的特征选择,很少考虑特征提取,且大多采用存在已久的神经网络,分类精度较低。因此,文中提出了一种两阶段深度特征选择(Two-Stage Deep Feature Selection,TSDFS)算法。第一阶段集成3种特征选择算法进行全面的特征选择,得到特征子集;第二阶段使用非监督神经网络获得特征子集的最佳表示,进而提高最终的分类精度。通过特征选择前后的分类效果和不同特征选择算法之间的对比来分析TSDFS的有效性,实验结果表明,TSDFS在减少特征数目的同时保持或者提高了分类的精度。
中图分类号:
[1]SHI T W,MOORTHY K,MOHAMAD M S,et al.RandomForest and Gene Ontology for functional analysis of microarray data[C]//International Workshop on Computational Intelligence and Applications.IEEE,2014:29-34. [2]LI Z Q,DU J Q,NIE B,et al.Summary of feature selection methods[J].Computer Engineering and Applications,2019,5(24):10-19. [3]KINGMA D P,WELLING M.Auto-Encoding Variational Bayes[J/OL].International Conference on Learning Representations.https://arxiv.org/pdf/1312.6114v10.pdf. [4]YANG Y,TANG P.Research of VAE_LSTM Algorithm inTime Series Prediction Model[J].Journal of Hunan University of Science and Technology(Natural Science Edition),2020,35(3):93-101. [5]IBRAHIM R,YOUSRI N A,ISMAIL M A,et al.Multi-level gene/MiRNA feature selection using deep belief nets and active learning[C]//International Conference of the IEEE Engineering in Medicine and Biology Society.IEEE,2014:3957-3960. [6]KOUL N,MANVI S S.A Scheme for Feature Selection from Gene Expression Data using Recursive Feature Elimination with Cross Validation and Unsupervised Deep Belief Network Classifier[C]//International Conference on Computing and Communications Technologies.IEEE,2019:31-36. [7]SYAFIANDINI A F,WASITO I,YAZID S,et al.Multimodal Deep Boltzmann Machines for feature selection on gene expression data[C]//International Conference on Advanced Computer Science and Information Systems.IEEE,2016:407-412. [8]SUTAWIKA L A,WASITO I.Restricted Boltzmann machinesfor unsupervised feature selection with partial least square feature extractor for microarray datasets[C]//International Conference on Advanced Computer Science and Information Systems.IEEE,2017:257-260. [9]WISESTY U N,PRATAMA B P B,ADITSANIA A,et al.Cancer Detection Based on Microarray Data Classification Using Deep Belief Network and Mutual Information[C]//Internatio-nal Conference on Instrumentation,Communications,Information Technology,and Biomedical Engineering.IEEE,2017:157-162. [10]KILICARSLANA S,ADEMB K,METE C.Diagnosis and classification of cancer using hybrid model based on ReliefF and con-volutional neural network[J].Medical Hypotheses,2020,137(5439):109577. [11]ZEEBAREE D Q.Gene Selection and Classification of Micro-array Data Using Convolutional Neural Network[C]//International Conference on Advanced Science and Engineering.IEEE,2018:145-150. [12]DING H,FENG P M,CHEN W,et al.Identification of bacteriophage virion proteins by the ANOVA feature selection and ana-lysis[J].Molecular Biosystems,2014,10(8):2229-2235. [13]ROBNIK-IKONJA M,KONONENKO I.Theoretical and Em-pirical Analysis of ReliefF and RReliefF[J].Machine Learning,2003,53(1/2):23-69. [14]YANG Q.Research on Judging Method of N1+N2 Structure Grammatical Relation Based on Random Forest[J].Journal of Chongqing University of Technology(Natural Science),2021,35(7):125-130. [15]HOU X X,SHEN L L,SUN K,et al.Deep Feature Consistent Variational Autoencoder[C]//Winter Conference on Applications of Computer Vision.IEEE,2017:1133-1141. [16]SALEM H,ATTIYA G,EL-FISHAWY N.Classification of human cancer diseases by gene expression profiles[J].Applied Soft Computing,2017,50:124-134. [17]AYYAD S M,SALEH A I,LABIB L M.Gene expression cancer classification using modified K-nearest neighbors technique[J].Biosystems,2019,176:41-51. [18]YANG L.Cancer classification based on deep metric neural network for low sample size gene expression profile[D].Shenzhen:Harbin Institute of Technology,2019. [19]NAIR V,HINTON G E.Rectified linear units improve restric-ted boltzmann machines[C]//International Conference on machine learning.New York:ACM,2010:807-814. [20]KINGMA D P,BA J.Adam:A method for stochastic optimization[J/OL].International Conference on Learning Representations. https://arxiv.org/pdf/1412.6980v8.pdf. [21]ZHANG H,BERG A C,MAIRE M,et al.SVM-KNN:Discriminative Nearest Neighbor Classification for Visual Category Reco-gnition[C]//Computer Society Conference on Computer Vision and Pattern Recognition.IEEE,2006:2126-2136. [22]RATSCH G.Soft Margins for AdaBoost[J].Machine Learning,2001,42(3):287-320. [23]UZMA,AL-OBEIDAT F,TUBAISHAT A,et al.Gene en-coder:a feature selection technique through unsupervised deep learning-based clustering for large gene expression data[J/OL].Neural Computing and Applications.https://doi.org/10.1007/s00521-020-05101-4. |
[1] | 王冠宇, 钟婷, 冯宇, 周帆. 基于矢量量化编码的协同过滤推荐方法 Collaborative Filtering Recommendation Method Based on Vector Quantization Coding 计算机科学, 2022, 49(9): 48-54. https://doi.org/10.11896/jsjkx.210700109 |
[2] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[3] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[4] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[5] | 李斌, 万源. 基于相似度矩阵学习和矩阵校正的无监督多视角特征选择 Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment 计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124 |
[6] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[7] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[8] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[9] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[10] | 高振卓, 王志海, 刘海洋. 嵌入典型时间序列特征的随机Shapelet森林算法 Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features 计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226 |
[11] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[12] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[13] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[14] | 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138 |
[15] | 祝文韬, 兰先超, 罗唤霖, 岳彬, 汪洋. 改进Faster R-CNN的光学遥感飞机目标检测 Remote Sensing Aircraft Target Detection Based on Improved Faster R-CNN 计算机科学, 2022, 49(6A): 378-383. https://doi.org/10.11896/jsjkx.210300121 |
|