Computer Science ›› 2022, Vol. 49 ›› Issue (7): 73-78.doi: 10.11896/jsjkx.210500092

• Database & Big Data & Data Science • Previous Articles     Next Articles

Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification

HU Yan-yu, ZHAO Long, DONG Xiang-jun   

  1. College of Computer Science and Technology,Qilu University of Technology,Jinan 250353,China
  • Received:2021-05-03 Revised:2021-09-09 Online:2022-07-15 Published:2022-07-12
  • About author:HU Yan-yu,born in 1996,master.Her main research interests include deep feature selection and so on.
    ZHAO Long,born in 1984,Ph.D,lectu-rer,master supervisor.His main research interests include image proces-sing,machine learning and knowledge discovery.
  • Supported by:
    National Natural Science Foundation of China(62076143,61806105) and Natural Science Foundation of Shandong Province(ZR2017LF020).

Abstract: Cancer is one of the deadliest diseases in the world.Using machine learning to process microarray data plays an important role in assisting the early diagnosis of cancer,but the numbers of genetic features are much more than samples,leading to an imbalance in the sample,and the efficiency and accuracy of classification are affected,so it is important to select the feature of gene array data.Most of the existing feature selection algorithms are single condition feature selection,which seldom consider feature extraction.Most of them use the long-existing neural network and have low classification accuracy.So,a two-stage deep feature selection(TSDFS) algorithm is proposed.The first stage aggregates three feature selection algorithms for comprehensive feature selection,and feature subsets are obtained.In the second stage,unsupervised neural network is used to obtain the best representation of feature subset and improve the final classification accuracy.This paper analyzes the effectiveness of TSDFS by comparing the classification effect before and after feature selection and different feature selection algorithms.Experimental results show that TSDFS algorithm can reduce the number of features while maintaining or improving the accuracy of classification.

Key words: Deep learning, Feature selection, Microarray data, Random forest, Variational auto-encoder

CLC Number: 

  • TP302
[1]SHI T W,MOORTHY K,MOHAMAD M S,et al.RandomForest and Gene Ontology for functional analysis of microarray data[C]//International Workshop on Computational Intelligence and Applications.IEEE,2014:29-34.
[2]LI Z Q,DU J Q,NIE B,et al.Summary of feature selection methods[J].Computer Engineering and Applications,2019,5(24):10-19.
[3]KINGMA D P,WELLING M.Auto-Encoding Variational Bayes[J/OL].International Conference on Learning Representations.https://arxiv.org/pdf/1312.6114v10.pdf.
[4]YANG Y,TANG P.Research of VAE_LSTM Algorithm inTime Series Prediction Model[J].Journal of Hunan University of Science and Technology(Natural Science Edition),2020,35(3):93-101.
[5]IBRAHIM R,YOUSRI N A,ISMAIL M A,et al.Multi-level gene/MiRNA feature selection using deep belief nets and active learning[C]//International Conference of the IEEE Engineering in Medicine and Biology Society.IEEE,2014:3957-3960.
[6]KOUL N,MANVI S S.A Scheme for Feature Selection from Gene Expression Data using Recursive Feature Elimination with Cross Validation and Unsupervised Deep Belief Network Classifier[C]//International Conference on Computing and Communications Technologies.IEEE,2019:31-36.
[7]SYAFIANDINI A F,WASITO I,YAZID S,et al.Multimodal Deep Boltzmann Machines for feature selection on gene expression data[C]//International Conference on Advanced Computer Science and Information Systems.IEEE,2016:407-412.
[8]SUTAWIKA L A,WASITO I.Restricted Boltzmann machinesfor unsupervised feature selection with partial least square feature extractor for microarray datasets[C]//International Conference on Advanced Computer Science and Information Systems.IEEE,2017:257-260.
[9]WISESTY U N,PRATAMA B P B,ADITSANIA A,et al.Cancer Detection Based on Microarray Data Classification Using Deep Belief Network and Mutual Information[C]//Internatio-nal Conference on Instrumentation,Communications,Information Technology,and Biomedical Engineering.IEEE,2017:157-162.
[10]KILICARSLANA S,ADEMB K,METE C.Diagnosis and classification of cancer using hybrid model based on ReliefF and con-volutional neural network[J].Medical Hypotheses,2020,137(5439):109577.
[11]ZEEBAREE D Q.Gene Selection and Classification of Micro-array Data Using Convolutional Neural Network[C]//International Conference on Advanced Science and Engineering.IEEE,2018:145-150.
[12]DING H,FENG P M,CHEN W,et al.Identification of bacteriophage virion proteins by the ANOVA feature selection and ana-lysis[J].Molecular Biosystems,2014,10(8):2229-2235.
[13]ROBNIK-ŠIKONJA M,KONONENKO I.Theoretical and Em-pirical Analysis of ReliefF and RReliefF[J].Machine Learning,2003,53(1/2):23-69.
[14]YANG Q.Research on Judging Method of N1+N2 Structure Grammatical Relation Based on Random Forest[J].Journal of Chongqing University of Technology(Natural Science),2021,35(7):125-130.
[15]HOU X X,SHEN L L,SUN K,et al.Deep Feature Consistent Variational Autoencoder[C]//Winter Conference on Applications of Computer Vision.IEEE,2017:1133-1141.
[16]SALEM H,ATTIYA G,EL-FISHAWY N.Classification of human cancer diseases by gene expression profiles[J].Applied Soft Computing,2017,50:124-134.
[17]AYYAD S M,SALEH A I,LABIB L M.Gene expression cancer classification using modified K-nearest neighbors technique[J].Biosystems,2019,176:41-51.
[18]YANG L.Cancer classification based on deep metric neural network for low sample size gene expression profile[D].Shenzhen:Harbin Institute of Technology,2019.
[19]NAIR V,HINTON G E.Rectified linear units improve restric-ted boltzmann machines[C]//International Conference on machine learning.New York:ACM,2010:807-814.
[20]KINGMA D P,BA J.Adam:A method for stochastic optimization[J/OL].International Conference on Learning Representations. https://arxiv.org/pdf/1412.6980v8.pdf.
[21]ZHANG H,BERG A C,MAIRE M,et al.SVM-KNN:Discriminative Nearest Neighbor Classification for Visual Category Reco-gnition[C]//Computer Society Conference on Computer Vision and Pattern Recognition.IEEE,2006:2126-2136.
[22]RATSCH G.Soft Margins for AdaBoost[J].Machine Learning,2001,42(3):287-320.
[23]UZMA,AL-OBEIDAT F,TUBAISHAT A,et al.Gene en-coder:a feature selection technique through unsupervised deep learning-based clustering for large gene expression data[J/OL].Neural Computing and Applications.https://doi.org/10.1007/s00521-020-05101-4.
[1] XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[2] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[3] TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[4] LI Bin, WAN Yuan. Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment [J]. Computer Science, 2022, 49(8): 86-96.
[5] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[6] WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[7] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[8] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[9] GAO Zhen-zhuo, WANG Zhi-hai, LIU Hai-yang. Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features [J]. Computer Science, 2022, 49(7): 40-49.
[10] CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126.
[11] HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[12] ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169.
[13] SU Dan-ning, CAO Gui-tao, WANG Yan-nan, WANG Hong, REN He. Survey of Deep Learning for Radar Emitter Identification Based on Small Sample [J]. Computer Science, 2022, 49(7): 226-235.
[14] ZHU Wen-tao, LAN Xian-chao, LUO Huan-lin, YUE Bing, WANG Yang. Remote Sensing Aircraft Target Detection Based on Improved Faster R-CNN [J]. Computer Science, 2022, 49(6A): 378-383.
[15] WANG Jian-ming, CHEN Xiang-yu, YANG Zi-zhong, SHI Chen-yang, ZHANG Yu-hang, QIAN Zheng-kun. Influence of Different Data Augmentation Methods on Model Recognition Accuracy [J]. Computer Science, 2022, 49(6A): 418-423.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!