Computer Science ›› 2016, Vol. 43 ›› Issue (7): 191-196.doi: 10.11896/j.issn.1002-137X.2016.07.035

Previous Articles     Next Articles

Local Pattern Query from Gene Expression Data

JIANG Tao, LI Zhan-huai, SHANG Xue-qun, CHEN Bo-lin and LI Wei-bang   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Local pattern mining plays an important role in gene expression data analysis.One classical model in local pattern mining is order-preserving subMatrix (OPSM),which captures the general tendency of subset of genes in subset of conditions.With the development of high-throughput gene microarray techniques,it produces massive of gene expression datasets.In this situation,it is urgent to design high performance algorithms.Most of the existing methods are batch mining technique,even though it can be addressed by query method,the comprehensiveness and behaviors still should be improved.To make data analysis efficient and accurate,we first proposed a prefix-tree based indexing method for gene expression data,then gave a column keyword based OPSM query methods.It uses index and search method instead of batch mining to query positive,negative and time-delayed OPSMs.We conducted extensive experiments and compared our method with existing methods.The experimental results demonstrate that the proposed method is efficient and scalable.

Key words: Gene expression data,Local pattern,Order-preserving submatrix,Keyword-based query

[1] Xue A R,Ju S G,He W H,et al.Study on Algorithms for Local Outlier Detection[J].Chinese Journal of Computers,2007,0(8):1455-1463(in Chinese) 薛安荣,鞠时光,何伟华,等.局部离群点挖掘算法研究[J].计算机学报,2007,0(8):1455-1463
[2] Yang M.Research on Algorithms for Outlier Detection[D].Wuhan:Huazhong University of Science and Technology,2012(in Chinese)杨茂林.离群检测算法研究[D].武汉:华中科技大学,2012
[3] Hu T T.Research on Outlier Detection Algorithm in Data Ming[D].Xiamen:Xiamen University,2014(in Chinese) 胡婷婷.数据挖掘中的离群点检测算法研究[D].厦门:厦门大学,2014
[4] Angiulli F,Basta S,Pizzuti C.Distance-based detection and prediction of outlier[J].IEEE Trans.Knowledge and Data Eng.,2006,2(18):145-160
[5] Wang Hai-xun,Wang Wei,Yang Jiong,et al.Clustering by Pattern Similarity in Large Data Sets [C]∥Proceedings of the 28th ACM SIGMOD International Conference on Management of Data.2002:394-405
[6] Pensa Ruggero G,Boulicaut Jean-Francois.Constrained Co-clustering of Gene Expression Data[C]∥Proceedings of the 8th SIAM International Conference on Data Mining (SDM).2008:25-36
[7] Faris A,Bader Joel S,Rajul A,et al.Query-based Biclusteringusing Formal Concept Analysis [C]∥Proceedings of the 12th SIAM International Conference on Data Mining (SDM).2012:648-659
[8] Jiang Tao,Li Zhan-huai,Chen Qun,et al.Towards OrderPre-serving SubMatrix Search and Indexing [M]∥ Database Systems for Advanced Applications: Proceedings of the 20th International Conference on Database Systems for Advanced Applications (DASFAA) Part II,2015:309-326
[9] Jiang Tao,Li Zhan-huai,Shang Xue-qun,et al.Constrained Query of Order-Preserving SubMatrix in Gene Expression Data [J].Frontiers of Computer Science,2016,0(5):1-5
[10] Wassim A,Mourad E,Hao Jin-kao.BicFinder:a Biclustering Algorithm for Microarray Data Analysis [J].Knowledge and Information Systems,2012,30(2):341-358
[11] Yang Jiong,Wang Wei,Wang Hai-xun,et al.δ-Clusters:Capturing Subspace Correlation in a Large Data Set [C]∥Procee-dings of the 18th International Conference on Data Engineering (ICDE).IEEE press,2002:517-528
[12] Cho H,Dhillon Inderjit S,Guan Yu-qiang,et al.Minimum Sum-Squared Residue Co-clustering of Gene Expression Data [C]∥Proceedings of the 4th SIAM International Conference on Data Mining (SDM).SIAM Press,2004:114-125
[13] Chen Shu-hua,Liu Juan,Zeng Tao.MMSE:A Generalized Coherence Measure for Identifying Linear Patterns[C]∥Procee-dings of IEEE International Conference on Bioinformatics and Biomedicine (BIBM).IEEE press,2014:489-492
[14] Matteo D,Alessandro F,Manuele B.Biclustering Gene Expressions using Factor Graphs and the Max-sum Algorithm[C]∥Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI).AAAI Press,2015:925-931
[15] Zhao Yu-hai,Wang Guo-ren,Yin Ying,et al.A Novel Approach to Revealing Positive and Negative Co-Regulated Genes [J].Journal of Computer Science and Technology,2007,2(2):261-272
[16] Chen Jiun-rung,Chang Ye-in.An of Up-Down Bit Pattern Approach to Coregulated and Negative-Coregulated Gene Clutering of Microarray Data[J].Journal of Computational Biology,2011,18(12):1777-1791
[17] Zhao Yu-hai,Yu Xu,Wang Guo-ren,et al.Maximal Subspace Coregulated Gene Clustering [J].IEEE Transactions on Know-ledge and Data Engineering,2008,20(1):83-98
[18] Wang Guo-ren,Yin Lin-jun,Zhao Yu-hai,et al.Efficiently Mi-ning Time-Delayed Gene Expression Patterns [J].IEEE Transactions on Systems,Man,and Cybernetics,Part B,2010,40(2):400-411
[19] Wang Guo-ren,Zhao Yu-hai,Zhao Xiang-guo,et al.Efficiently Mining Local Conserved Clusters from Gene Expression Data [J].Neurocomputing,2010,73(7-9):1425-1437
[20] Yin Ying,Zhao Yu-hai,Zhang Bin,et al.Mining Synchronous and Asynchronous Co-Regulated Gene Clusters from Time Series Microarray Data [J].Chinese Journal of Computers,2007,0(8):1302-1314(in Chinese) 印莹,赵宇海,张斌,等.时序微阵列数据中的同步和异步共调控基因聚类[J].计算机学报,2007,30(8):1302-1314
[21] Amichai P,Saharon R.Optimal Set Cover Formulation for Exclusive Row Biclustering of Gene Expression [J].Journal of Computer Science and Technology,2014,9(3):423-435
[22] Rui H,Maderia Sara C.BicSPAM:Flexible Biclustering usingSequential Patterns [J].BMC Bioinformatics,2014,5(1):1-20
[23] Trapp Andrew C,Li Chao,Patrick F.Recovering All Generalized Order-Preserving SubMatrices:New Exact Formulations and Algorithms [J/OL].Annals of Operations Research,2016.http://link.springer.com/article/10.1007%2Fs10479-016-2173-9
[24] Jiang Tao,Li Zhan-huai,Chen Qun,et al.Parallel Partitioningand Mining Gene Expression Data with Butterfly Network [M]∥ Database and Expert Systems Applications:Proceedings of the 24th International Conference on Database and Expert Systems Applications (DEXA),Part I.2013:129-144
[25] Jiang Tao,Li Zhan-huai,Chen Qun,et al.OMEGA:An Order-Preserving SubMatrix Mining,Indexing and Search Tool [M]∥ Machine Learning and Knowledge Discovery in Database: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/ PKDD),Part III.2015:303-307
[26] Broad Institute.Datasets.rar and 5q_gct[DB/OL].http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi
[27] Gao B J,Griffith O L,Ester M,et al.On the Deep Order-preserving Submatrix Problem:a Best Effort Approach[J].IEEE Transactions on Knowledge and Data Engineering,2012,4(2):309-325

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!