计算机科学 ›› 2016, Vol. 43 ›› Issue (7): 191-196.doi: 10.11896/j.issn.1002-137X.2016.07.035

• 软件与数据库技术 • 上一篇    下一篇

基因表达数据中局部模式的查询

姜涛,李战怀,尚学群,陈伯林,李卫榜   

  1. 西北工业大学计算机学院 西安710072,西北工业大学计算机学院 西安710072,西北工业大学计算机学院 西安710072,西北工业大学计算机学院 西安710072,西北工业大学计算机学院 西安710072
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家“九七三”重点基础研究发展规划(2012CB316203),国家“八六三”高技术研究发展计划(2015AA015307),国家自然科学基金重点项目(61033007,61332014),国家自然科学基金面上项目(61272121,61572367),中央高校基础研究经费项目(3102015JSJ0011)资助

Local Pattern Query from Gene Expression Data

JIANG Tao, LI Zhan-huai, SHANG Xue-qun, CHEN Bo-lin and LI Wei-bang   

  • Online:2018-12-01 Published:2018-12-01

摘要: 基因表达数据分析一般是通过挖掘局部模式来实现的。保序子矩阵是局部模式挖掘中一种经典的模型,可以获取到在若干条件下表现出一致趋势的一组基因。高通量基因微阵列技术的进步,促进了海量基因表达数据的产生,使得对高性能基因表达数据分析算法的需求极为迫切。现有方法大多数是通过批量挖掘的方法来分析数据,即使有通过查询方式来获取精确结果的方法,其全面性与性能也有待提高。为了提高数据分析的效率与准确性,首先提出一种基于前缀树的基因表达数据索引gIndex,然后给出了一种基于列关键词查询的保序子矩阵分析方法GEQc。其不经过批量挖掘,只需要建立索引并通过关键词来完成正相关/负相关/时滞等模式的查询。实验结果表明,与现有方法相比,所提算法具有良好的数据分析效率与可扩展性。

关键词: 基因表达数据,局部模式,保序子矩阵,关键词查询

Abstract: Local pattern mining plays an important role in gene expression data analysis.One classical model in local pattern mining is order-preserving subMatrix (OPSM),which captures the general tendency of subset of genes in subset of conditions.With the development of high-throughput gene microarray techniques,it produces massive of gene expression datasets.In this situation,it is urgent to design high performance algorithms.Most of the existing methods are batch mining technique,even though it can be addressed by query method,the comprehensiveness and behaviors still should be improved.To make data analysis efficient and accurate,we first proposed a prefix-tree based indexing method for gene expression data,then gave a column keyword based OPSM query methods.It uses index and search method instead of batch mining to query positive,negative and time-delayed OPSMs.We conducted extensive experiments and compared our method with existing methods.The experimental results demonstrate that the proposed method is efficient and scalable.

Key words: Gene expression data,Local pattern,Order-preserving submatrix,Keyword-based query

[1] Xue A R,Ju S G,He W H,et al.Study on Algorithms for Local Outlier Detection[J].Chinese Journal of Computers,2007,0(8):1455-1463(in Chinese) 薛安荣,鞠时光,何伟华,等.局部离群点挖掘算法研究[J].计算机学报,2007,0(8):1455-1463
[2] Yang M.Research on Algorithms for Outlier Detection[D].Wuhan:Huazhong University of Science and Technology,2012(in Chinese)杨茂林.离群检测算法研究[D].武汉:华中科技大学,2012
[3] Hu T T.Research on Outlier Detection Algorithm in Data Ming[D].Xiamen:Xiamen University,2014(in Chinese) 胡婷婷.数据挖掘中的离群点检测算法研究[D].厦门:厦门大学,2014
[4] Angiulli F,Basta S,Pizzuti C.Distance-based detection and prediction of outlier[J].IEEE Trans.Knowledge and Data Eng.,2006,2(18):145-160
[5] Wang Hai-xun,Wang Wei,Yang Jiong,et al.Clustering by Pattern Similarity in Large Data Sets [C]∥Proceedings of the 28th ACM SIGMOD International Conference on Management of Data.2002:394-405
[6] Pensa Ruggero G,Boulicaut Jean-Francois.Constrained Co-clustering of Gene Expression Data[C]∥Proceedings of the 8th SIAM International Conference on Data Mining (SDM).2008:25-36
[7] Faris A,Bader Joel S,Rajul A,et al.Query-based Biclusteringusing Formal Concept Analysis [C]∥Proceedings of the 12th SIAM International Conference on Data Mining (SDM).2012:648-659
[8] Jiang Tao,Li Zhan-huai,Chen Qun,et al.Towards OrderPre-serving SubMatrix Search and Indexing [M]∥ Database Systems for Advanced Applications: Proceedings of the 20th International Conference on Database Systems for Advanced Applications (DASFAA) Part II,2015:309-326
[9] Jiang Tao,Li Zhan-huai,Shang Xue-qun,et al.Constrained Query of Order-Preserving SubMatrix in Gene Expression Data [J].Frontiers of Computer Science,2016,0(5):1-5
[10] Wassim A,Mourad E,Hao Jin-kao.BicFinder:a Biclustering Algorithm for Microarray Data Analysis [J].Knowledge and Information Systems,2012,30(2):341-358
[11] Yang Jiong,Wang Wei,Wang Hai-xun,et al.δ-Clusters:Capturing Subspace Correlation in a Large Data Set [C]∥Procee-dings of the 18th International Conference on Data Engineering (ICDE).IEEE press,2002:517-528
[12] Cho H,Dhillon Inderjit S,Guan Yu-qiang,et al.Minimum Sum-Squared Residue Co-clustering of Gene Expression Data [C]∥Proceedings of the 4th SIAM International Conference on Data Mining (SDM).SIAM Press,2004:114-125
[13] Chen Shu-hua,Liu Juan,Zeng Tao.MMSE:A Generalized Coherence Measure for Identifying Linear Patterns[C]∥Procee-dings of IEEE International Conference on Bioinformatics and Biomedicine (BIBM).IEEE press,2014:489-492
[14] Matteo D,Alessandro F,Manuele B.Biclustering Gene Expressions using Factor Graphs and the Max-sum Algorithm[C]∥Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI).AAAI Press,2015:925-931
[15] Zhao Yu-hai,Wang Guo-ren,Yin Ying,et al.A Novel Approach to Revealing Positive and Negative Co-Regulated Genes [J].Journal of Computer Science and Technology,2007,2(2):261-272
[16] Chen Jiun-rung,Chang Ye-in.An of Up-Down Bit Pattern Approach to Coregulated and Negative-Coregulated Gene Clutering of Microarray Data[J].Journal of Computational Biology,2011,18(12):1777-1791
[17] Zhao Yu-hai,Yu Xu,Wang Guo-ren,et al.Maximal Subspace Coregulated Gene Clustering [J].IEEE Transactions on Know-ledge and Data Engineering,2008,20(1):83-98
[18] Wang Guo-ren,Yin Lin-jun,Zhao Yu-hai,et al.Efficiently Mi-ning Time-Delayed Gene Expression Patterns [J].IEEE Transactions on Systems,Man,and Cybernetics,Part B,2010,40(2):400-411
[19] Wang Guo-ren,Zhao Yu-hai,Zhao Xiang-guo,et al.Efficiently Mining Local Conserved Clusters from Gene Expression Data [J].Neurocomputing,2010,73(7-9):1425-1437
[20] Yin Ying,Zhao Yu-hai,Zhang Bin,et al.Mining Synchronous and Asynchronous Co-Regulated Gene Clusters from Time Series Microarray Data [J].Chinese Journal of Computers,2007,0(8):1302-1314(in Chinese) 印莹,赵宇海,张斌,等.时序微阵列数据中的同步和异步共调控基因聚类[J].计算机学报,2007,30(8):1302-1314
[21] Amichai P,Saharon R.Optimal Set Cover Formulation for Exclusive Row Biclustering of Gene Expression [J].Journal of Computer Science and Technology,2014,9(3):423-435
[22] Rui H,Maderia Sara C.BicSPAM:Flexible Biclustering usingSequential Patterns [J].BMC Bioinformatics,2014,5(1):1-20
[23] Trapp Andrew C,Li Chao,Patrick F.Recovering All Generalized Order-Preserving SubMatrices:New Exact Formulations and Algorithms [J/OL].Annals of Operations Research,2016.http://link.springer.com/article/10.1007%2Fs10479-016-2173-9
[24] Jiang Tao,Li Zhan-huai,Chen Qun,et al.Parallel Partitioningand Mining Gene Expression Data with Butterfly Network [M]∥ Database and Expert Systems Applications:Proceedings of the 24th International Conference on Database and Expert Systems Applications (DEXA),Part I.2013:129-144
[25] Jiang Tao,Li Zhan-huai,Chen Qun,et al.OMEGA:An Order-Preserving SubMatrix Mining,Indexing and Search Tool [M]∥ Machine Learning and Knowledge Discovery in Database: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/ PKDD),Part III.2015:303-307
[26] Broad Institute.Datasets.rar and 5q_gct[DB/OL].http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi
[27] Gao B J,Griffith O L,Ester M,et al.On the Deep Order-preserving Submatrix Problem:a Best Effort Approach[J].IEEE Transactions on Knowledge and Data Engineering,2012,4(2):309-325

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!