Computer Science ›› 2021, Vol. 48 ›› Issue (3): 206-213.doi: 10.11896/jsjkx.200200081

• Artificial Intelligence • Previous Articles     Next Articles

Prediction of Protein Subcellular Localization Based on Clustering and Feature Fusion

WANG Yi-hao, DING Hong-wei, LI Bo, BAO Li-yong, ZHANG Ying-jie   

  1. School of Information Science and Engineering,Yunnan University,Kunming 650500,China
  • Received:2020-02-16 Revised:2020-05-21 Online:2021-03-15 Published:2021-03-05
  • About author:WANG Yi-hao,born in 1995,postgra-duate,is a member of China Computer Federatio.His main research interests include machine lear-ning and computer vision.
    DING Hong-wei,born in 1964,Ph.D,professor,Ph.D supervisor.His main research interests include multiple access communication and machine learning.
  • Supported by:
    National Natural Science Foundation of China(61461053,61461054).

Abstract: The prediction of protein subcellular location is not only an important basis for the study of protein structure and function,but also of great significance for understanding the pathogenesis of some diseases,drug design and discovery.However,how to use machine learning to accurately predict the location of protein subcellular has always been a challenging scientific problem.To solve this problem,this paper proposes a protein subcellular localization method based on clustering and feature fusion.Firstly,autocorrelation coefficient method and entropy density method are introduced into the construction of protein feature expression model,and an improved PseAAC(Pseudo-amino acid composition) method is proposed on the basis of traditional PseAAC.In order to express protein sequence information better,this paper fuses autocorrelation coefficient method,entropy density method and the improved PseAAC to construct a new protein sequence representation model.Secondly,we use principal component analysis (PCA) to reduce the dimension of the fused feature vector.Thirdly,we adopt the LibD3C ensemble classifier to classify and predict protein subcellular,and the prediction accuracy is evaluated by leave-one-out cross validation on Gram-positive and Gram-negative datasets.Finally,the experimental results are compared with other existing algorithms.The results show that the new method achieves the prediction accuracy of 99.24% and 95.33% on Gram-positive and Gram-negative datasets respectively,and the new method is scientific and effective.

Key words: Autocorrelation coefficient, Clustering, Feature fusion, Principal component analysis, Pseudo-amino acid composition

CLC Number: 

  • TP391
[1]Q1AO S P,YAN B Q.Review of protein subcellular localization prediction[J].Application Research of Computers,2014,31(2):321-327.
[2]CHEN X J,HU X J,XUE W.Prediction of protein subcellular localization based on multilayer sparse coding[J].Chinese Journal of Biotechnology,2019,35(4):687-696.
[3]CHOU K C,XIANG C,XUAN X.PLoc_bal-mHum:Predictsubcellular localization of human proteins by PseAAC and quasi-balancing training dataset[J].Genomics,2019,111:1274-1282.
[4]WAN S,MAK M W,KUNG S Y.Gram-LocEN:Interpretable prediction of subcellular multi-localization of Gram-positive and Gram-negative bacterial proteins[J].Chemometrics and Intelligent Laboratory Systems,2017,162:1-9.
[5]LIU Q H,LAI Y P,DING H W,et al.Protein subcellular localization prediction based on SVM[J].Computer Engineering and Applications,2019,55(11):136-141.
[6]ZHANG H C,GAO Y J,DENG M H,et al.A survey on algorithms for protein contact prediction[J].Journal of Computer Research and Development,2017,51(1):1-19.
[7]CHOU K C.Some remarks on protein attribute prediction and pseudo amino acid composition[J].Journal of theoretical biology,2011,273(1):236-247.
[8]CHOU K C,CAI Y D.Predicting protein localization in budding Yeast[J].Bioinformatics,2005,21(7):944-950.
[9]LI L Z,DONG Z M.Using pseudo amino acid composition to predict protein subcellular localization:approached by incorporating evolutionary conservation information[J].Acta Biophysica Sinica,2009,25:125-132.
[10]WANG M H,GONG Y,WANG Q,et al.Prediction of protein subcellular localization by incorporating sequence and protein-protein interaction features[J].Journal of University of Electronic Science and Technology of China,2015,44(3):467-470.
[11]RAHMAN J,MONDAL N I,ISLAM K B,et al.Feature Fusion Based SVM Classifier for Protein Subcellular Localization Prediction[J].Journal of Integrative Bioinformatics,2016,13(1):23-33.
[12]LI Z C,LAI Y H,CHEN L L,et al.Identifying subcellular locali-zations of mammalian protein complexes based on graph theory with a random forest algorithm[J].Mol.Biosyst,2013,9(4):658-667.
[13]HE B,MORTUZA S M,WANG Y,et al.NeBcon:protein contact map prediction using neural network training coupled with naive Bayes classifiers[J].Bioinformatics,2017,33(15):2296-2306.
[14]CHOU K C,SHEN H B.Hum-PLoc:a novel ensemble classifier for predicting human protein subcellular localization[J].Biochemical and Biophysical Research Communications,2006,347(1):150-157.
[15]WEI L Y,DING Y J,SU R,et al.Prediction of human protein subcellular localization using deep learning[J].Journal of Parallel and Distributed Computing,2018,117:212-217.
[16]ZHAO Q.A review of principal component analysis[J].Softwart Engineering,2016,19(6):1-3.
[17]LIN C,CHEN W Q,QIU C,et al.LibD3C:Ensemble classifiers with a clustering and dynamic selection strategy[J].Neurocomputing,2014,123:424-435.
[18]MAO W,MU X,ZHENG Y,et al.Leave-one-out cross-validation-based model selection for multi-input multi-output support vector machine[J].Neural Computing and Applications,2014,24(2):441-451.
[19]ZHANG Y P,ZHA Y L,ZHAO S,et al.Protein structure class prediction based on autocorrelation coefficient and PseAAC[J].Journal of Frontiers of Computer Science and Technology,2014,8(1):103-108.
[20]CHOU K C.Prediction of protein cellular attributes using pseudo-amino acid composition[J].Proteins,2001,43(3):246-255.
[21]CHEN W Q.LibD3C2.0:An Ensemble Classifier Based onClustering and Its Parallel Implementation[D].Xiamen:Xiamen University,2014.
[22]FREY B J,DUECK D.Clustering by passing messages between data points[J].Science,2007,315(5814):972-976.
[23]WONG T T.Parametric methods for comparing the performance of two classification algorithms evaluated by k-fold cross validation on multiple data sets[J].Pattern Recognition:The Journal of the Pattern Recognition Society,2017,65:97-107.
[24]KROOPNICK M H,CHEN J,CHOI J,et al.Assessing Classification Bias in Latent Class Analysis:Comparing Resubstitution and Leave-One-Out Methods[J].Journal of Modern Applied Statistical Methods,2010,9(1):52-63.
[25]NEI S Y,LI M H.Construction and comparative analysis of seve-ral conditional independence test statistics[J].The Journal of Quantitative of Quantitative & Technical Economics,2014,31(2):137-147.
[26]CHOU K C,SHEN H B.Cell-PLoc:a package of Web servers for predicting subcellular localization of proteins in various organisms[J].Nature Protocols,2008,3(2):153-162.
[27]JAVED F,HAYAT M.Predicting subcellular localization ofmulti-label proteins by incorporating the sequence features into Chou’s PseAAC [J].Genomics,2019,111:1325-1332.
[28]WU Z C,XIAO X,CHOU K C.iLoc-Gpos:a multi-layer classi-fier for predicting the subcellular localization of single plex and multiplex Gram-positive bacterial proteins [J].Protein and Peptide Letters,2012,19(1):4-14.
[29]XIAO W,ZHANG J,LI G Z.Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble[J].BMC Bioinformatics,2015,16(S12):S1.
[30]CHOU K C,SHEN H B.Large-scale predictions of gram-negative bacterial protein subcellular locations[J].Journal of Proteome Research,2006,5:3420-3428.
[31]SHEN H B,CHOU K C.Gneg-mPLoc:a top-down strategy to enhance the quality of predicting subcellular localization of Gram-negative bacterial proteins [J].Journal of Theoretical Biology,2010,264(2):326-333.
[32]XIAO X,WU Z C,CHOU K C.A multi-label classifier for predicting the subcellular localization of Gram-negative bacterial proteins with both single and multiple sites[J].PLoS ONE,2011,6(6):e20592.
[1] CHAI Hui-min, ZHANG Yong, FANG Min. Aerial Target Grouping Method Based on Feature Similarity Clustering [J]. Computer Science, 2022, 49(9): 70-75.
[2] LU Chen-yang, DENG Su, MA Wu-bin, WU Ya-hui, ZHOU Hao-hao. Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients [J]. Computer Science, 2022, 49(9): 183-193.
[3] LI Qi-ye, XING Hong-jie. KPCA Based Novelty Detection Method Using Maximum Correntropy Criterion [J]. Computer Science, 2022, 49(8): 267-272.
[4] ZHANG Ying-tao, ZHANG Jie, ZHANG Rui, ZHANG Wen-qiang. Photorealistic Style Transfer Guided by Global Information [J]. Computer Science, 2022, 49(7): 100-105.
[5] CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126.
[6] YU Shu-hao, ZHOU Hui, YE Chun-yang, WANG Tai-zheng. SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion [J]. Computer Science, 2022, 49(6A): 256-260.
[7] MAO Sen-lin, XIA Zhen, GENG Xin-yu, CHEN Jian-hui, JIANG Hong-xia. FCM Algorithm Based on Density Sensitive Distance and Fuzzy Partition [J]. Computer Science, 2022, 49(6A): 285-290.
[8] CHEN Jing-nian. Acceleration of SVM for Multi-class Classification [J]. Computer Science, 2022, 49(6A): 297-300.
[9] YANG Yue, FENG Tao, LIANG Hong, YANG Yang. Image Arbitrary Style Transfer via Criss-cross Attention [J]. Computer Science, 2022, 49(6A): 345-352.
[10] QUE Hua-kun, FENG Xiao-feng, LIU Pan-long, GUO Wen-chong, LI Jian, ZENG Wei-liang, FAN Jing-min. Application of Grassberger Entropy Random Forest to Power-stealing Behavior Detection [J]. Computer Science, 2022, 49(6A): 790-794.
[11] CHEN Yong-ping, ZHU Jian-qing, XIE Yi, WU Han-xiao, ZENG Huan-qiang. Real-time Helmet Detection Algorithm Based on Circumcircle Radius Difference Loss [J]. Computer Science, 2022, 49(6A): 424-428.
[12] SUN Jie-qi, LI Ya-feng, ZHANG Wen-bo, LIU Peng-hui. Dual-field Feature Fusion Deep Convolutional Neural Network Based on Discrete Wavelet Transformation [J]. Computer Science, 2022, 49(6A): 434-440.
[13] Ran WANG, Jiang-tian NIE, Yang ZHANG, Kun ZHU. Clustering-based Demand Response for Intelligent Energy Management in 6G-enabled Smart Grids [J]. Computer Science, 2022, 49(6): 44-54.
[14] LAN Ling-xiang, CHI Ming-min. Remote Sensing Change Detection Based on Feature Fusion and Attention Network [J]. Computer Science, 2022, 49(6): 193-198.
[15] CHEN Jia-zhou, ZHAO Yi-bo, XU Yang-hui, MA Ji, JIN Ling-feng, QIN Xu-jia. Small Object Detection in 3D Urban Scenes [J]. Computer Science, 2022, 49(6): 238-244.
Full text



No Suggested Reading articles found!