计算机科学 ›› 2019, Vol. 46 ›› Issue (4): 66-72.doi: 10.11896/j.issn.1002-137X.2019.04.010

• 大数据与数据科学 • 上一篇    下一篇

一种用于影像遗传学关联分析的高阶统计量结构化稀疏算法

茹锋, 徐锦, 常琪, 阚丹会   

  1. 长安大学电子与控制工程学院 西安710064
  • 收稿日期:2018-08-27 出版日期:2019-04-15 发布日期:2019-04-23
  • 通讯作者: 茹 锋(1969-),男,博士,教授,主要研究方向为数据挖掘、模式识别,E-mail:35831406@qq.com(通信作者)
  • 作者简介:徐 锦(1995-),女,硕士,主要研究方向为机器学习、模式识别;常 琪(1992-),女,硕士,主要研究方向为机器学习、模式识别;阚丹会(1993-),女,硕士,主要研究方向为机器学习、模式识别。
  • 基金资助:
    本文受西安市智慧高速公路信息融合与控制重点实验室(201805062ZD13CG46)资助。

High Order Statistics Structured Sparse Algorithm for Image Genetic Association Analysis

RU Feng, XU Jin, CHANG Qi, KAN Dan-hui   

  1. School of Electronic Control,Chang’an University,Xi’an 710064,China
  • Received:2018-08-27 Online:2019-04-15 Published:2019-04-23

摘要: 神经影像技术和分子遗传学的发展产生了大量的影像遗传学数据,极大地促进了复杂精神疾病的研究。但因为该数据的特征维度过高且相关性的度量都是假设数据服从高斯分布,所以传统的算法往往无法很好地解释两类数据之间的依赖关系。为了解决传统算法的问题,文中提出了一种对大量SNP和fMRI数据进行关联分析的方法,该方法通过构建稀疏的特征网络结构来指导fused lasso进行特征选择,与此同时,该方法利用高阶统计量提取出具有统计显著性的变量,从而识别出与精神疾病有关的生物标记物。实验结果表明,在模拟数据中所提算法得到的典型向量值的分布与实际数据中值的分布几乎一致且得到的相关系数与数据集中实际的相关系数最接近,所提算法的平均相关系数最高达到81%,比L1-SCCA提高了约20%,比FL-SCCA提高了约3%;在真实数据中,相比另外两种算法,所提算法可以找出更多的对精神分裂症有潜在影响的基因与脑区。实验结果证明:该算法可以在合理时间内有效识别出风险基因和异常脑区。

关键词: 高阶统计量, 关联分析, 特征选择, 稀疏表示, 影像遗传学

Abstract: The development of neuroimaging technology and molecular genetics has produced a large number of imaging genetic data,which has greatly promoted the study of complex mental diseases.However,because the dimensions of the data are too high and the correlation measure is based on the assumption that data obey Gaussian distribution,traditionalalgorithms often fail to explain the dependencies between two types of data.In order to solve the shortcomings of traditional algorithms,this paper proposed a method for correlation analysis of a large number of SNP and fMRI data.This method guides fused lasso to perform feature selection by constructing a network structure of features,and uses higher-order statistics to extract statistically significant variables.Thus,biomarkers associated with mental illness are identified.The experimental results show that the distribution of typical vector values obtained by the algorithm in simulation data are almost consistent with the real data,and the correlation coefficient obtained is the closest to the correlation coefficient in the real dataset.The average correlation coefficient of the proposed algorithm is up to 81%,which is about 20% higher than L1-SCCA and about 3% higher than FL-SCCA.Compared with the other two algorithms in real data,the proposed algorithm can find more genes and brain regions that have potential effects on schizophrenia.The experimental results show that the proposed algorithm can effectively identify risk genes and abnormal brain regions within a reasonable time.

Key words: Correlation analysis, Feature selection, Higher-order statistics, Image genetics, Sparse representation

中图分类号: 

  • TP301.6
[1]RIPKE S,NEALE B M,CORVIN A,et al.Biological insights from 108 schizophrenia-associated genetic loci[J].Nature,2014,511(7510):421.
[2]LIU J,CALHOUN V D.A review of multivariate analyses in imaging genetics.Frontiers in Neuroinformatics,2014,8(29):1-11.
[3]EDITION F.Diagnostic and statistical manual of mental disorders[M].Arlington:American Psychiatric Publishing,2013.
[4]LIU J,PEARLSON G,WINDEMUTH A,et al.Combining fMRI and SNP data to investigate connections between brain function and genetics using parallel ICA[J].Human Brain Mapping,2009,30(1):241-255.
[5]LE FLOCH É,GUILLEMOT V,FROUIN V,et al.Significant correlation between a set of genetic polymorphisms and a functional brain network revealed by feature selection and sparse Partial Least Squares[J].Neuroimage,2012,63(1):11-24.
[6]CHI E C,ALLEN G I,ZHOU H,et al.Imaging genetics via sparse canonical correlation analysis[C]∥2013 IEEE 10th International Symposium on Biomedical Imaging (ISBI).IEEE,2013:740-743.
[7]DU L,HUANG H,YAN J,et al.Structured sparse canonical correlation analysis for brain imaging genetics:an improved GraphNet method[J].Bioinformatics,2016,32(10):1544-1551.
[8]BOGDAN R,SALMERON B J,CAREY C E,et al.Imaging Genetics and Genomics in Psychiatry:A Critical Review of Progress and Potential[J].Biological Psychiatry,2017,82(3):165-175.
[9]HU W,LIN D,CAO S,et al.Adaptive sparse multiple canonical correlation analysis with application to imaging (epi)genomics study of schizophrenia[J].IEEE Transactions on Biomedical Engineering,2018,65(2):390-399.
[10]DU L,HUANG H,YAN J,et al.Structured sparse CCA for brain imaging genetics via graph OSCAR[J].BMC Systems Bio-logy,2016,10(3):68-77.
[11]HOTELLING H.Relations Between Two Sets of Variates[J].Biometrika,1936,28(3/4):321-377.
[12]WITTEN D M,TIBSHIRANI R J.Extensions of Sparse Cano- nical Correlation Analysis with Applications to Genomic Data[J].Statistical Applications in Genetics & Molecular Biology,2009,8(1):1-27.
[13]TIBSHIRANI R,SAUNDERS M,ROSSET S,et al.Sparsity and smoothness via the fused lasso[J].Journal of the Royal Statistical Society,2010,67(1):91-108.
[14]HYVÄRINEN A.Fast and Robust Fixed-Point Algorithms for Independent Component Analysis[J].IEEE Transactions on Neural Networks,1999,10(3):626-634.
[15]HYVÄRINEN A.New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit[J].Advances in Neural Information Processing Systems,1997,10:273-279.
[16]CHEN X,LIU H.An Efficient Optimization Algorithm for Structured Sparse CCA,with Applications to eQTL Mapping[J].Statistics in Biosciences,2012,4(1):3-26.
[17]HASTIE T.A penalized matrix decomposition,with applications to sparse principal components and canonical correlation analysis[J].Biostatistics,2009,10(3):515-534.
[18]FANG J,LIN D,SCHULZ C,et al.Joint sparse canonical correlation analysis for detecting differential imaging genetics mo-dules[J].Bioinformatics,2016,32(22):3480-3488.
[19]HU W,LIN D,CALHOUN V D,et al.Integration of SNPs-FMRI-methylation data with sparse multi-CCA for schizophrenia study∥Engineering in Medicine & Biology Society.IEEE,2016.
[20]CAO H,LIN D,DUAN J,et al.Biomarker Identification for Dia- gnosis of Schizophrenia with Integrated Analysis of fMRI and SNPs[C]∥IEEE International Conference on Bioinformatics and Biomedicine.2012:223-228.
[21]LAW M H,COTTON R G,BERGER G E.The role of phospholipases A2 in schizophrenia[J].Molecular Psychiatry,2006,11(6):547-556.
[22]SANDERS A R,DUAN J,DRIGALENKO E I,et al.Transcriptome study of differential expression in schizophrenia[J].Human Molecular Genetics,2013,22(24):5001-5014.
[23]CAO H,DUAN J,LIN D,et al.Integrating fMRI and SNP data for biomarker identification for schizophrenia with a sparse representation based variable selection method[J].Bmc Medical Genomics,2013,6 (3):1-8.
[24]OZDEMIR H,ERTUGRUL A,BASAR K,et al.Differential effects of antipsychotics on hippocampal presynaptic protein expressions and recognition memory in a schizophrenia model in mice[J].Progress in neuro-psychopharmacology & biological psychiatry,2012,39(1):62-68.
[25]KIRCHER T T,THIENEL R.Functional brain imaging of symptoms and cognition in schizophrenia[J].Progress in Brain Research,2005,150(2):299-308.
[1] 李斌, 万源.
基于相似度矩阵学习和矩阵校正的无监督多视角特征选择
Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment
计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124
[2] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[3] 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩.
混合改进的花授粉算法与灰狼算法用于特征选择
Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection
计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135
[4] 储安琪, 丁志军.
基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理
Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation
计算机科学, 2022, 49(4): 134-139. https://doi.org/10.11896/jsjkx.210300075
[5] 孙林, 黄苗苗, 徐久成.
基于邻域粗糙集和Relief的弱标记特征选择方法
Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief
计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094
[6] 李宗然, 陈秀宏, 陆赟, 邵政毅.
鲁棒联合稀疏不相关回归
Robust Joint Sparse Uncorrelated Regression
计算机科学, 2022, 49(2): 191-197. https://doi.org/10.11896/jsjkx.210300034
[7] 张叶, 李志华, 王长杰.
基于核密度估计的轻量级物联网异常流量检测方法
Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method
计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108
[8] 李思颖, 徐杨, 王欣, 赵若成.
基于关联分析的铁路旅客同行预测方法
Railway Passenger Co-travel Prediction Based on Association Analysis
计算机科学, 2021, 48(9): 95-102. https://doi.org/10.11896/jsjkx.200700097
[9] 孙林, 平国楼, 叶晓俊.
基于本地化差分隐私的键值数据关联分析
Correlation Analysis for Key-Value Data with Local Differential Privacy
计算机科学, 2021, 48(8): 278-283. https://doi.org/10.11896/jsjkx.201200122
[10] 杨蕾, 降爱莲, 强彦.
基于自编码器和流形正则的结构保持无监督特征选择
Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization
计算机科学, 2021, 48(8): 53-59. https://doi.org/10.11896/jsjkx.200700211
[11] 侯春萍, 赵春月, 王致芃.
基于自反馈最优子类挖掘的视频异常检测算法
Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining
计算机科学, 2021, 48(7): 199-205. https://doi.org/10.11896/jsjkx.200800146
[12] 胡艳梅, 杨波, 多滨.
基于网络结构的正则化逻辑回归
Logistic Regression with Regularization Based on Network Structure
计算机科学, 2021, 48(7): 281-291. https://doi.org/10.11896/jsjkx.201100106
[13] 孙明玮, 司维超, 董琪.
基于多维度数据的网络服务质量的综合评估研究
Research on Comprehensive Evaluation of Network Quality of Service Based on Multidimensional Data
计算机科学, 2021, 48(6A): 246-249. https://doi.org/10.11896/jsjkx.200900131
[14] 周钢, 郭福亮.
基于特征选择的高维数据集成学习方法研究
Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data
计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102
[15] 丁思凡, 王锋, 魏巍.
一种基于标签相关度的Relief特征选择算法
Relief Feature Selection Algorithm Based on Label Correlation
计算机科学, 2021, 48(4): 91-96. https://doi.org/10.11896/jsjkx.200800025
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!