计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230500006-7.doi: 10.11896/jsjkx.230500006

• 大数据&数据科学 • 上一篇    下一篇

基于相似网络融合算法的癌症亚型预测

张晓茜1, 李东喜2   

  1. 1 太原理工大学数学学院 太原 030600
    2 太原理工大学大数据学院 太原 030600
  • 发布日期:2024-06-06
  • 通讯作者: 李东喜(dxli0426@126.com)
  • 作者简介:(1518998276@qq.com)
  • 基金资助:
    国家自然科学基金项目(11571009);山西省应用基础研究项目(201901D111086);山西省重点研发计划项目(202102020101004);山西省回国留学人员科研资助项目(2022-074)

Cancer Subtype Prediction Based on Similar Network Fusion Algorithm

ZHANG Xiaoxi1, LI Dongxi2   

  1. 1 College of Mathematics,Taiyuan University of Technology,Taiyuan,Shanxi 030060,China
    2 College of Big Data,Taiyuan University of Technology,Taiyuan,Shanxi 030060,China
  • Published:2024-06-06
  • About author:ZHANG Xiaoxi,born in 1997,postgra-duate.Her main research interests include data mining and analysis and so on.
    LI Dongxi,born in 1982,Ph.D,associate professor.His main research interests include data mining and biostatistics.
  • Supported by:
    National Natural Science Foundation of China(11571009),Basic Research Project of Shanxi Province(201901D111086),Key Research and Development Project of Shanxi Province(202102020101004) and Research Support Program of Shanxi Pro-vince for Returned Overseas Students(2022-074).

摘要: 从基因表达数据中挖掘基因之间的相互作用关系,构建基因调控网络,是生物信息学中重要的研究课题之一。但目前流行的神经网络在其架构中仅考虑基因之间的交互和关联,不考虑患者之间的交互和关联。为此,提出了一种基于加权基因相似网络和样本相似网络融合算法的癌症亚型预测模型,即WGCSS(Weighted Genetic Correlation network and Sample Similarity network)。该方法实现了特征空间和样本空间信息的融合,同时考虑了基因之间和样本之间的相互作用关系,并使用图卷积网络进行预测。在两个空间中聚合信息会导致严重的过度平滑问题,为此在该模型中引入残差层以缓解过度平滑问题。该方法通过聚合两个空间中的数据信息,可以使得癌症亚型预测的结果更加准确。为了验证方法的泛化性能,使用了乳腺浸润癌(BRCA)、多形性胶质母细胞瘤(GBM)和肺癌(LUNG)数据集进行分析,由此产生的高分类精度结果可以表明该方法的优越性。另外,还对3类数据集进行了生存分析,证明该方法在3个癌症数据集上癌症亚型的生存曲线存在显著差异。

关键词: 加权基因相似网络, 样本相似网络, 残差图卷积网络, L1正则, 癌症亚型预测

Abstract: Mining the interaction relationship between genes from gene expression data and construct gene regulatory network is one of the important research topics in bioinformatics.However,the current popular neural network only considers the interaction and association between genes in its architecture,and does not consider the interaction and association between patients.Therefore,a cancer subtype prediction model based on the fusion algorithm of weighted gene similarity network and sample similarity network,namely WGCSS,is proposed in this paper.In this method,the fusion of feature space and sample space information is realized,and the interaction between genes and samples is considered,and the graph convolutional network is used for prediction.Aggregating information in two spaces will lead to a serious oversmoothing problem.Therefore,a residual layer is introduced in the model to alleviate the oversmoothing problem.This method can make the prediction of cancer subtypes more accurate by aggregating the data information in the two spaces.To verify the generalization performance of the method,datasets of invasive breast carcinoma(BRCA),glioblastoma multiforme(GBM),and LUNG(LUNG) are used for analysis,and the resulting high classification accuracy demonstrates the superiority of the method.Survival analysis is also performed on three types of data sets,and it is proved that the method has significant differences in the survival curves of cancer subtypes in three cancer datasets.

Key words: Weighted gene similarity network, Sample similarity network, Residual graph convolutional network, L1 regular, Cancer subtype prediction

中图分类号: 

  • TP399
[1]BERGER M F,MARDIS E R.The emerging clinical relevanceof genomics in cancer medicine[J].Nature Reviews Clinical Oncology,2018,15(6):353-365.
[2]JIA Q,CHU H,JIN Z,et al.High-throughput single-cell se-quencing in cancer research[J].Signal Transduction and Targeted Therapy,2022,7(1):145.
[3]CHEN W,LI J,HUANG S,et al.GCEN:An easy-to-use toolkit for gene co-expression network analysis and lncRNAs annotation[J].Current Issues in Molecular Biology,2022,44(4):1479-1487.
[4]YANG R,DU Y,WANG L,et al.Weighted gene co-expression network analysis identifies CCNA2 as a treatment target of prostate cancer through inhibiting cell cycle[J].Journal of Can-cer,2020,11(5):1203.
[5]ZHANG B,HORVATH S.A general framework for weighted gene co-expression network analysis[J].Statistical Applications in Genetics and Molecular Biology,2005,4(1).
[6]LI C N,SHAO Y H,DENGN Y.Robust L1-norm two-dimensional linear discriminant analysis[J].Neural Networks,2015,65:92-104.
[7]GUO S,GUO D,CHEN L,et al.A L1-regularized feature selection method for local dimension reduction on microarray data[J].Computational Biology and Bhemistry,2017,67:92-101.
[8]LIU B,CHI W,LI X,et al.Evolving the pulmonary nodulesdiagnosis from classical approaches to deep learning-aided decision support:three decades’ development course and future prospect[J].Journal of Cancer Research and Clinical Oncology,2020,146:153-185.
[9]QI LL,WU B T,TANG W,et al.Long-term follow-up of persistent pulmonary pure ground-glass nodules with deep learning-assisted nodule segmentation[J].European Radiology,2020,30:744-755.
[10]MUNIR K,FREZZA F,RIZZIA.Brain tumor segmentationusing 2D-UNET convolutional neural network[J].Deep Lear-ning for Cancer Diagnosis,2021,908:239-248.
[11]XU J,WU P,CHENY,et al.A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data[J].BMC Bioinformatics,2019,20(1):1-11.
[12]PARK K H,BATBAATAR E,PIAO Y,et al.Deep learning feature extraction approach for hematopoietic cancer subtype classification[J].International Journal of Environmental Research and Public Health,2021,18(4):2197.
[13]LOPEZ M M.Deep Learning for Brain Tumor Segmentation[M].University of Colorado Colorado Springs,2017.
[14]MUNIR K,ELAHI H,AYUB A,et al.Cancer diagnosis usingdeep learning:a bibliographic review[J].Cancers,2019,11(9):1235.
[15]CHEN R,YANG L,GOODISON S,et al.Deep-learning ap-proach to identifying cancer subtypes using high-dimensional genomic data[J].Bioinformatics,2020,36(5):1476-1483.
[16]DAI W,YUE W,PENG W,et al.Identifying Cancer Subtypes Using a Residual Graph Convolution Model on a Sample Similarity Network[J].Genes,2022,13(1):65.
[17]BRADLEY P S,MANGASARIAN O L.Feature selection viaconcave minimization and support vector machines[C]//ICML.1998:82-90.
[18]WANG B,MEZLINI A M,DEMIR F,et al.Similarity network fusion for aggregating data types on a genomic scale[J].Nature Methods,2014,11(3):333-337.
[19]XU J,WU P,CHEN Y,et al.A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data[J].BMC Bioinformatics,2019,20(1):1-11.
[20]HUO Y,XIN L,KANG C,et al.SGL-SVM:a novel method for tumor classification via support vector machine with sparse group Lasso[J].Journal of Theoretical Biology,2020,486:110098
[21]ZHONG L,MENG Q,CHEN Y.A Cascade Flexible NeuralForest Model for Cancer Subtypes Classification on Gene Expression Data[J].Computational Intelligence and Neuroscience,2021,2021:6480456.
[22]CHANDRA B,GUPTA M.An efficient statistical feature selection approach for classification of gene expression data[J].Journal of Biomedical Informatics,2011,44(4):529-535.
[23]LV J,PENG Q,CHEN X,et al.A multi-objective heuristic algorithm for gene expression microarray data classification[J].Expert Systems with Applications,2016,59:13-19.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!