计算机科学 ›› 2018, Vol. 45 ›› Issue (7): 230-236.doi: 10.11896/j.issn.1002-137X.2018.07.040

所属专题: 医学图像

• 图形图像与模式识别 • 上一篇    下一篇

一种基于突变基因网络的癌症驱动通路识别算法

郭炳1,郑文萍2,韩素青1   

  1. 太原师范学院计算机科学与技术系 山西 晋中0306191;
    山西大学计算机与信息技术学院 太原0300062
  • 收稿日期:2018-01-28 出版日期:2018-07-30 发布日期:2018-07-30
  • 作者简介:郭 炳(1985-),男,硕士,助教,CCF会员,主要研究方向为生物信息学与机器学习,E-mail:guobing@tynu.edu.cn(通信作者);郑文萍(1979-),女,博士,副教授,硕士生导师,主要研究方向为图论算法与生物信息学;韩素青(1964-),女,博士,副教授,硕士生导师,主要研究方向为数据挖掘与机器学习。
  • 基金资助:
    本文受山西省回国留学人员科研基金(2017-014),国家自然科学基金(61572005),山西省软科学研究项目(2016041036-4)资助。

Driver Pathway Identification Algorithm Based on Mutated Gene Networks for Cancer

GUO Bing1,ZHENG Wen-ping2,HAN Su-qing1   

  1. Department of Computer Science and Technology,Taiyuan Normal University,Jinzhong,Shanxi 030619,China1;
    School of Computer & Information Technology,Shanxi University,Taiyuan 030006,China2
  • Received:2018-01-28 Online:2018-07-30 Published:2018-07-30

摘要: 大型癌症基因组项目(TCGA,ICGC等)产生了大量的癌症组学数据,使人们深入研究癌症变为可能,其中寻找引发癌症的相关突变基因是一个重要挑战。在癌细胞中,基因变异可分为两类:一类是可导致癌症发生的驱动突变(driver mutation),另一类是对癌症发生扩散没有影响的乘客突变(passenger mutation)。识别癌症驱动基因有利于理解癌症发病原理和发展进程以及研发癌症药物或进行靶向治疗,是生物信息学中的重要问题。文中提出一种基于突变基因网络的癌症驱动通路识别算法GNDP,对癌症病人的体细胞突变数据进行分析。该算法定义了非重叠平衡度来度量基因对的位于同一驱动通路的可能性;根据基因对的非重叠平衡度、互斥和覆盖度,构建基因互斥网络,很大程度上减少了网络边数,提高了计算效率;在所构造的基因互斥网络中将查找到的极大团作为潜在驱动通路基因集合;用覆盖度和互斥度对潜在驱动通路基因集合进行筛选,得到其极大权重子团,并将其作为识别出的驱动通路。分别在模拟数据、肺腺癌以及多形性成胶质细胞瘤突变数据上对GNDP算法进行有效性验证,并将其与经典驱动通路识别算法Dendrix和Multi-Dendrix进行实验对比。结果表明,GNDP不需要指定驱动通路的基因个数,能在模拟数据上准确检测出所有人工设置的驱动通路;针对肺腺癌和多形性成胶质细胞瘤突变数据,GNDP在不需要任何先验知识的情况下达到较高的识别准确率,能高效地识别出主要驱动通路,其结果优于对比算法。

关键词: 癌症基因组, 基因互斥网络, 极大团, 驱动通路, 体细胞突变

Abstract: Large cancer genome projects such as The Cancer Genome Atlas(TCGA) and International Cancer Genome Consortium(ICGC) have produced big amount of data collected from patients with different cancer types.The identification of mutated genes causing cancer is a significant challenge.Genovariation in cancer cells can be divided into two types:functional driver mutation and random passenger mutation.Identifcation of driver genes is benefit to understand the pathogenesis and progression of cancer,as well as research cancer drug and targeted therapy,and it is an essential problem in the field of bioinformatics.This paper proposed a driver pathway identification algorithm based on mutated gene networks for cancer(GNDP).In GNDP,a nonoverlap balance metric is defined to measure the possibility of two genes lying in the same driver pathway.To reduce the complexity of the constructed mutually exclusive gene networks,the nonoverlap balance metric,the exclusivity and the coverage of a gene pair are computed first,and then the edges with low nonoverlap balance metric,low exclusivity and low coverage are deleted.Then,all maximal cliques which might be potential driver pathways are found out.After that,the weight of each clique is assigned as the product of its exclusive degree and coverage degree and then every node of a clique will be checked to judge whether is’s deletion might obtain a larger weight.At last,the maximal weight cliques are obtained in mutually exclusive gene networks as the final driver pathways.This paper compared GNDP algorithm with classical algorithm Dendrix and Multi-Dendrix on both simulated data sets and somatic mutation data sets.The results show that GNDP can detect all artificial pathways in simulated data.For Lung adenocarcinoma and Glioblastoma data,GNDP shows higher efficiency and accuracy than the comparison algorithms.In addition,GNDP does not need any prior knowledge and does not need to set the number of genes in driver pathways in advance.

Key words: Cancer genome, Driver pathways, Gene networks, Maximal cliques, Somatic mutation

中图分类号: 

  • TP311
[1]HANAHAN D,WEINBERG R A.The hallmarks of cancer[J].Cell,2000,100(1):57-70.
[2]FIDLER I J.The pathogenesis of cancer metastasis:the ‘seed and soil’ hypothesis revisited[J].Nature Reviews Cancer,2003,3(6):453-458.
[3]MCLENDON R,FRIEDMAN A,BIGNER D,et al.Comprehensive genomic characterization defines human glioblastoma genes and core pathways.
[J].Nature,2008,455(7216):1061-1068.
[4]The Cancer Genome Atlas Research Network.Integratedgenomic analyses of ovarian carcinoma[J].Nature,2011,474,609-615.
[5]MEYERSON M,GABRIEL S,GETZ G.Advances in under-standing cancer genomes through second-generation sequencing[J].Nature Reviews Genetics,2010,11(10):685-696.
[6]HUDSON T J,ANDERSON W,ARTEZ A,et al..International network of cancer genome projects[J].Nature,2010,464(7291):993-998.
[7]MARDIS E R,WILSON R K.Cancer genome sequencing:a review[J].Human Molecular Genetics,2009,18(R2):R163.
[8]BARRETINA J,CAPONIGRO G,STRANSKY N,et al.TheCancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity[J].Nature,2012,483(7391):603-607.
[9]MENG J,LI R,HAO H.Gene microarray data classificationbased on intersecting neighborhood rough set[J].Computer Scien-ce,2015,42(6):37-40.(in Chinese)
孟军,李锐,郝涵.基于相交邻域粗糙集的基因微阵列数据分类[J].计算机科学,2015,42(6):37-40.
[10]OVERDEVEST J B,THEODORESCU D,LEE J K.Utilizing the Molecular Gateway:The path to personalized cancermana-gement[J].Clinical Chemistry,2009,55(4):684-697.
[11]SWANTON C,CALDAS C.Molecular classification of solidtumours:towards pathway-driven therapeutics[J].British Journal of Cancer,2009,100(10):1517-1522.
[12]ZHENG C H,YANG W,CHONG Y W,et al.Identification of mutated driver pathways in cancer using a multi-objective optimization model[J].Computers in Biology & Medicine,2016,72:22-29.
[13]GREENMAN C,STEPHENS P,SMITH R,et al.Patternsof somatic mutation in human cancer genomes[J].Nature,2007,446(7132):153-158.
[14]BATCHELOR T T,BETENSKY R A,ESPOSITO J M,et al.Age-dependent prognostic effects of genetic alterations in glioblastoma[J].Clinical Cancer Research:An Official Journal of the American Association for Cancer Research,2004,10(1):228-233.
[15]VOGELSTEIN B,KINZLER K W.Cancer genes and the pathways they control[J].Nature Medicine,2004,10(8):789-799.
[16]XIE Q Q,LI D F,ZHANG W.Two novel tree structure-based methods for gene selection[J].Computer Science,2015,42(7):250-253.(in Chinese)
谢倩倩,李订芳,章文.两种基于树结构的基因选择算法[J].计算机科学,2015,42(7):250-253.
[17]VANDIN F,UPFAL E,RAPHAEL B J.De novo discovery of mutated driver pathways in cancer[J].Genome Research,2012,22(2):375-85.
[18]DING L,GETZ G,WHEELER D A,et al.Somatic mutations affect key pathways in lung adenocarcinoma[J].Nature,2008,455(7216):1069-1075.
[19]LAWRENCE M S,STOJANOV P,POLAK P,et al.Mutational heterogeneity in cancer and the search for new cancer genes[J].Nature,2013,499(7457):214-218.
[20]BOCA S M,KINZLER K W,VELCULESCU V E,et al.Pa-tient-oriented gene set analysis for cancer mutation data[J].Genome Biology,2010,11(11):R112.
[21]EFRONI S,BENHAMO R,EDMONSON M,et al.Detectingcancer gene networks characterized by recurrent genomic alterations in a population[J].Plos One,2011,6(1):e14437.
[22]HAHN W C,WEINBERG R A.Modelling the molecular circuitry of cancer[J].Nature Reviews Cancer,2002,2(5):331-341.
[23]VASKE C J,BENZ S C,SANBORN J Z,et al.Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM[J].Bioinformatics,2010,26(12):i237-i245.
[24]BASHASHATI A,HAFFARI G,DING J,et al.DriverNet:uncovering the impact of somatic driver mutations on transcriptional networks in cancer[J].Genome Biology,2012,13(12):R124.
[25]ZENG X X,LIAO Y L,LIU Y S,et al.Prediction and validation of disease genes using HeteSim Scores[J].IEEE/ACM Transactions on Computational Biology & Bioinformatics,2016,PP(99):1.
[26]CHEN X,YAN G Y,REN W,et al.Modularized random walk with restart for candidate disease genes prioritization[C]∥The Third International Symposium on Optimization and SystemsBio-logy(OSB’09).2009:353-360.
[27]CHEN X,HUANG L,XIE D,et al.EGBMMDA:Extreme gradient boosting machine for miRNA-disease association prediction[J].Cell Death & Disease,2018,9(1):3.
[28]MILLER C A,SETTLE S H,SULMAN E P,et al.Discovering functional modules by identifying recurrent and mutually exclusive mutational patterns in tumors[J].BMC Medical Genomics,2011,4(1):34.
[29]ZHAO J F,ZHANG S H,WU L Y,et al.Efficient methods for identifying mutated driver pathways in cancer[J].Bioinforma-tics,2012,28(22):2940-2947.
[30]LEISERSON M D M,BLOKH D,SHARAN R,et al.Simulta-neous identification of multiple driver pathways in cancer[J].Plos Computational Biology,2013,9(5):e1003054.
[31]WU H,GAO L,LI F,et al.Identifying overlapping mutateddriver pathways by constructing gene networks in cancer[J].Bmc Bioinformatics,2015,16(S5):S3.
[32]WU H.Algorithm for detecting driver pathways in cancer based on mutated gene networks[J/OL].Chinese Journal of Compu-ter,(2017-02-24).http://kns.cnki.net/kcms/detail/11.1826.TP.20170224.1408.101.html.(in Chinese)
吴昊.基于突变基因网络的致癌驱动通路检测算法[J/OL].计算机学报,(2017-02-24).http://kns.cnki.net/kcms/detail/11.1826.TP.20170224.1408.101.html.
[33]SZCZUREK E,BEERENWINKEL N.Modeling mutual exclusivity of cancer mutations[J].Plos Computational Biology,2014,10(3):e1003503.
[1] 蔡齐荣, 吴璟莉.
基于多组学数据识别癌症驱动通路的模型和算法
Model and Algorithm for Identifying Driver Pathways in Cancer by Integrating Multi-omics Data
计算机科学, 2019, 46(9): 310-314. https://doi.org/10.11896/j.issn.1002-137X.2019.09.047
[2] 朱金彬,武继刚,隋秀峰.
基于极大团的边缘云节点聚合算法
Edge Cloud Clustering Algorithm Based on Maximal Clique
计算机科学, 2018, 45(4): 60-65. https://doi.org/10.11896/j.issn.1002-137X.2018.04.008
[3] 黄治国,李娜.
基于分辨函数的极大团搜索算法
Discernibility Function-based Algorithm for Finding All Maximal Cliques
计算机科学, 2014, 41(4): 248-251.
[4] 王宁,杨扬,巩华荣,赵耀培,孟坤.
一种基于极大团的关键时间段挖掘方法
Mining Method of Key Time Interval Based on Maximum Clique
计算机科学, 2012, 39(6): 166-169.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!