Computer Science ›› 2021, Vol. 48 ›› Issue (2): 70-75.doi: 10.11896/jsjkx.200500156

• New Distributed Computing Technologies and Systems • Previous Articles     Next Articles

SpaRC Algorithm Hyperparameter Optimization Methodology Based on TPE

DENG Li, WU Jin-da, LI Ke-xue, LU Ya-kang   

  1. School of Mechatronic Engineering and Automation,Shanghai University,Shanghai 200072,China Shanghai Key Laboratory of Power Station Automation Technology,Shanghai 200072,China
  • Received:2020-05-29 Revised:2020-12-05 Online:2021-02-15 Published:2021-02-04
  • About author:DENG Li,born in 1978,associate professor.Her main research interests include metagene data analysis and machine learning.
  • Supported by:
    The National Natural Science Foundation of China(61802246).

Abstract: The assembly of metagenomic sequences faces huge challenge in computing and storage.SpaRC (Spark Reads Clustering) is a metagenomic sequence fragment clustering algorithm based on Apache Spark,which provides a scalable solution for clustering of billions of sequencing fragments.However,setting SpaRC parameters is a very challenging task.SpaRC algorithm has many hyperparameters that have a great impact on the performance of the algorithm.Choosing the appropriate hyperparameter set is crucial to the performance of SpaRC algorithm.In order to improve the performance of SpaRC algorithm,a hyperpara-meter optimization method based on Tree Parzen Estimator (TPE) is explored,which can use prior knowledge to efficiently adjust the parameters,accelerate the search for the optimal parameters by reducing the calculation task to achieve the optimal clustering effect,thus avoding expensive parameter exploration.After experiments with long-reads(PacBio) and short-reads(CAMI2),the results show that the proposed method has a great effect on improving the performance of SpaRC algorithm.

Key words: Hyperparametric optimization, Metagenomics, Sequence fragment clustering, SpaRC, TPE

CLC Number: 

  • TP399
[1] MARTIN H,MANJA M.De novo transcriptome assembly:A comprehensive cross-species comparison of short-read RNA-Seq assemblers[J].Giga Science,2019,8(5):39.
[2] QUINCE C,WALKER A,SIMPSON J,et al.Shotgun meta-genomics,from sampling to analysis[J].Nat Biotechnol,2017,35:833-844.
[3] LENCZ T,YU J,PALMER C,et al.High-depth whole genome sequencing of an Ashkenazi Jewish reference panel:enhancing sensitivity,accuracy,and imputation[J].Human Genetics,2018,137(4):343-355.
[4] BERTRAND D,SHAW J,KALATHIYAPPAN M,et al.Hy-brid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human micro-biomes[J].Nature Biotechnology,2019,37(8):937-944.
[5] LI D H,LIU C M,LUO R B,et al.MEGAHIT:An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de bruijn graph[J].Bioinformatics,2015,31(10):1674-1676.
[6] GUO X,YU N,DING X J,et al.DIME:A novel framework for de novo metagenomic sequence assembly[J].Jouranl of Computational Biology,2015,22(2):159-177.
[7] SHI L Z,MENG X D,TSENG E,et al.SpaRC:Scalable Sequence Clustering using Apache Spark[J].Bioinformatics,2019,35(5):760-768.
[8] SUN Y,XUE B,ZHANG M,et al.An experimental study on hyper-parameter optimization for stacked auto-encoders[C]//2018 IEEE Congress on Evolutionary Computation (CEC).IEEE,2018:1-8.
[9] ZHOU Z H.Machine Learning[M].Beijing:Tsinghua University Press,2016:147-162.
[10] GHANBARI-ADIVI F,MOSLEH M.Text Emotion Detection in Social Networks Using a Novel Ensemble Classifier Based on Parzen Tree Estimator (TPE)[J].Neural Computing and Applications,2019,31(12):8971-8983.
[11] RAGHAVAN U N,ALBERT R,KUMARA S.Near linear time algorithm to detect community structures in large-scale networks[J].Physical Review Research,2007,76(3):036106.
[12] BERGSTRA J,KOMER B,ELIASMITH C,et al.Hyperopt:A Python library for model selection and hyperparameter optimization[J].Computational Science & Discovery,2015,8(1):014008.
[13] BERGSTRA J,BARDENET R,BENGIO Y,et al.Algorithms for Hyper-Parameter Optimization[J].Advances in Neural Information Processing Systems,2011,24:2546-2554.
[14] YANG L,SHAMI A.On hyperparameter optimization of machine learning algorithms:Theory and practice[J].Neurocomputing,2020,415:295-316.
[15] SCZYRBA A,HOFMANN P,BELMANN P,et al.Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software[J].Nature Methods,2017,14:1063-1071.
[16] BHASKAR M,NICK C.An Introduction to Neural Information Retrieval[M].America:Now Publishers,2018:11-19.
[1] HE Zhi-peng, LI Rui-lin, NIU Bei-fang. Highly Available Elastic Computing Platform for Metagenomics [J]. Computer Science, 2021, 48(1): 326-332.
[2] JIANG Fan, WAN Xiao-Fei (Department of Computer Science ~ Technology, University of Science & Technology of China, Hefei 230027). [J]. Computer Science, 2006, 33(11): 29-30.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!