计算机科学 ›› 2020, Vol. 47 ›› Issue (4): 60-66.doi: 10.11896/jsjkx.190300073

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于深度主成分相关自编码器的多模态影像遗传数据研究

李刚, 王超, 韩德鹏, 刘强伟, 李莹   

  1. 长安大学电子与控制工程学院 西安710064
  • 收稿日期:2019-03-18 出版日期:2020-04-15 发布日期:2020-04-15
  • 通讯作者: 李刚(15229296166@chd.edu.cn)
  • 基金资助:
    西安市科学技术局科技创新引导项目(201805045YD23CG29(5));长安大学中央高校基本科研业务费专项资金 (300102329203);长安大学研究生科研创新实践项目(300103002075)

Study on Multimodal Image Genetic Data Based on Deep Principal Correlated Auto-encoders

LI Gang, WANG Chao, HAN De-peng, LIU Qiang-wei, LI Ying   

  1. School of Electronic and Control Engineering,Chang’an University,Xi’an 710064,China
  • Received:2019-03-18 Online:2020-04-15 Published:2020-04-15
  • Contact: LI Gang,born in 1975,associate professor,postgraduate supervisor,is not member of China Computer Federation.His main research interests include image processing and pattern recognition,machine learning and multi-mode biomedical information fusion
  • Supported by:
    This work was supported by the Science and Technology Innovation Guidance Project of Xi’an Science and Technology Bureau (201805045YD23CG29(5)),Fundamental Research Funds for the Central Universities,Chang’an University (CHD) (300102329203),postgraduate research innovation practice project of Chang’an University (300103002075)

摘要: 脑成像表型和基因变异已成为影响精神分裂症等复杂疾病的重要因素。研究人员根据以往在致病机理方面的深入研究,已经提出了很多基于深度神经网络或正则化的模型,这些模型通常包含某种形式的惩罚项或具有重建目标的自编码器结构,但其所使用的多模态数据的特征维数往往大于样本个数。为了应对高维数据分析的困难并突破深度典型关联分析的局限性,文中提出了一种由多模态线性特征学习的主成分分析和基于限制玻尔兹曼机的多模态非线性特征学习的多层信念网络组成的有效模型。该模型和先前的先进模型一起被应用在实际的多模态数据集上进行测试和分析。实验发现,与已有模型相比,深度主成分相关自编码器模型学习的特征具有更高的分类性能和更强的关联性。在分类精度方面,两类模态数据的分类精度均超过了90%,相比平均精度在65%左右的基于CCA的模型和平均精度在80%左右的基于DNN的模型,该模型的分类效果有了显著提高。在聚类性能评估的实验中,该模型以93.75%的平均归一化互信息指标和3.8%的平均分类错误率指标进一步验证了其优越的分类性能。在最大关联性分析方面,当顶层节点输出维度一致时,该模型以0.926的最大关联性胜于其他先进模型,在高维数据分析方面表现出了优异的性能。

关键词: 关联分析, 深度主成分相关自编码器, 信念网络, 影像基因组学, 优化算法

Abstract: Brain imaging phenotype and genetic mutation has become the important factors that affect complex diseases such as schizophrenia,researchers based on previous work in the pathogenesis of in-depth research have proposed many models based on deep neural network or regularization,typically involving either some form of norm or auto-encoders with a reconstruction objective,but the multi-modal data of those models tend to have the number of feature dimensions which more than that of samples.In order to solve the difficulties of high-dimensional data analysis and overcome the limitations of deep canonical correlation analysis,a competent optimization algorithm is exploited to solve deep canonical correlation analysis (DCCA) with principal component analysis (PCA) on the multi-modal linear features learning and multi-layer belief network based on restricted Boltzmann machine (RBM) on multi-modal nonlinear features learning.The model,together with previous advanced model,has been applied to test and analyze the actual multi-modal data.Experiments show that the deep principal component correlation auto-encoders model has higher correlation and better classification performance than those previous model.In terms of classification accuracy,the classification accuracy of the two types of modal data is more than 90%.Compared with the CCA-based model with an average accuracy of about 65% and the DNN-based model with an average accuracy of about 80%,the classification effect of this model is significantly improved.In the experiment of clustering performance evaluation,the model further verified the significant classification effect of the model with average normalized mutual information of 93.75% and average classification error rate of 3.8%.In terms of maximum correlation analysis,on the premise that the output dimensions of top-level nodes are consistent,this model outperforms other advanced models with the maximum correlation of 0.926,showing excellent performance in high-dimensional data analysis.

Key words: Belief networks, Correlation analysis, Deep principal correlated auto-encoders, Image genomics, Optimization algorithms

中图分类号: 

  • TP391
[1]NAYLOR M G,XIHONG L,WEISS S T,et al.Using Canonical Correlation Analysis to Discover Genetic Regulatory Variants [J].Plos One,2010,5(5):1-6.
[2]PARKHOMENKO E,TRITCHLER D,BEYENE J.Sparse Canonical Correlation Analysis with Application to Genomic Data Integration [J].Statistical Applications in Genetics and Molecular Biology,2009,8(1):1-34.
[3]WAAIJENBORG S,VERSELEWEL D W H,PHILIP C,et al.Quantifying The Association between Gene Expressions and DNA-markers by Penalized Canonical Correlation Analysis [J].Statistical Applications in Genetics & Molecular Biology,2008,7(1):1-29.
[4]WITTEN D M,TIBSHIRANI R J.Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data [J].Statistical Applications in Genetics & Molecular Biology,2009,8(1):1-27.
[5]CAO S,QIN H,GOSSMANN A,et al.Unified Tests for Finescale Mapping and Identifying Sparse High-dimensional Sequence Associations [J].Bioinformatics,2016,32(3):330-337.
[6]DENG S P,HU W,CALHOUN V D,et al.Integrating Imaging Genomic Data in The Quest For Biomarkers for Schizophrenia Disease [J].IEEE/ACM Transactions on Computational Biology & Bioinformatics,2017,15(5):1480-1491.
[7]HOTELLING H.Relations Between Two Sets of Variates [J].Biometrika,1936,28(3/4):321-377.
[8]WITTEN D M,ROBERT T,TREVOR H.A Penalized Matrix Decomposition with Applications to Sparse Principal Components and Canonical Correlation Analysis [J].Biostatistics,2009,10(3):515-534.
[9]FANG J,LIN D,SCHULZ C,et al.Joint Sparse Canonical Correlation Analysis for Detecting Differential Imaging Genetics Modules [J].Bioinformatics,2011,32(22):3480-3488.
[10]ANDREW G,ARORA R,BILMES J,et al.Deep Canonical Correlation Analysis[C]//Proceedings of the International Conference on Machine Learning.2013:1247-1255.
[11]WANG W,ARORA R,LIVESCU K,et al.On Deep Multi-view Representation Learning[C]//Proceedings of International Conference on Machine Learning.2015:1083-1092.
[12]PARKHOMENKO E,TRITCHLER D,BEVENE J.Genomewide Sparse Canonical Correlation of Gene Expression with Genotypes [J].Bmc Proceedings,2007,1(9):1-5.
[13]CAO K A L,MARTIN P G,ROBERT-GRANIE C,et al.Sparse Canonical Methods for Biological Data Integration:Application to a Cross-platform Study [J].Bmc Bioinformatics,2009,10(1):1-17.
[14]WANG W,ARORA R,LIVESCU K,et al.Unsupervised Learning of Acoustic Features via Deep Canonical Correlation Analysis[C]//Proceedings of IEEE International Conference on Acoustics.2015:1-5.
[15]DAI Y H,LIAO L Z,LI D.On Restart Procedures for The Conjugate Gradient Method [J].Numerical Algorithms,2004,35(2/3/4):249-260.
[16]HU W,CAI B,CALHOUN V,et al.Multi-modal Brain Connectivity Study Using Deep Collaborative Learning [J].Springer Nature America,2018,7(4):1-9.
[17]ZAHARIA M,CHOWDHURY M,FRANKLIN M J,et al.Spark:Cluster Computing with Working Sets [J].HotCloud,2010,10(10):95.
[18]NG A Y,JORDAN M I,WEISS Y.On Spectral Clustering:Analysis and an Algorithm[C]//Proceedings of the 14th International Conference on Neural Information Processing Systems:Natural and Synthetic.2001:1-8.
[19]CAI D,HE X,HAN J.Document Clustering Using Locality Preserving Indexing [J].IEEE Transactions on Knowledge & Data Engineering,2005,17(12):1624-1637.
[20]GHADDAR B,NAOUMSAWAYA J.High Dimensional Data Classification and Feature Selection Using Support Vector Machines [J].European Journal of Operational Research,2018,265(3):86-93.
[21]SOHN K,SHANG W,LEE H.Improved Multimodal Deep Learning with Variation of Information[C]//Proceedings of the International Conference on Neural Information Processing Systems.2014:2141-2149.
[22]SRIVASTAVA N,SALAKHUTDINOV R.Multimodal Learning with Deep Boltzmann Machines [J].Journal of Machine Learning Research,2014,15(8):1-9.
[23]HU W,CAI B,ZHANG A,et al.Deep Collaborative Learning with Application to Multimodal Brain Development Study [J].IEEE Transactions on Biomedical Engineering,2019,7(10):1-8.
[1] 陈俊, 何庆, 李守玉.
基于自适应反馈调节因子的阿基米德优化算法
Archimedes Optimization Algorithm Based on Adaptive Feedback Adjustment Factor
计算机科学, 2022, 49(8): 237-246. https://doi.org/10.11896/jsjkx.210700150
[2] 黄国兴, 杨泽铭, 卢为党, 彭宏, 王静文.
利用粒子滤波方法求解数据包络分析问题
Solve Data Envelopment Analysis Problems with Particle Filter
计算机科学, 2022, 49(6A): 159-164. https://doi.org/10.11896/jsjkx.210600110
[3] 刘漳辉, 郑鸿强, 张建山, 陈哲毅.
多无人机使能移动边缘计算系统中的计算卸载与部署优化
Computation Offloading and Deployment Optimization in Multi-UAV-Enabled Mobile Edge Computing Systems
计算机科学, 2022, 49(6A): 619-627. https://doi.org/10.11896/jsjkx.210600165
[4] 储安琪, 丁志军.
基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理
Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation
计算机科学, 2022, 49(4): 134-139. https://doi.org/10.11896/jsjkx.210300075
[5] 李思颖, 徐杨, 王欣, 赵若成.
基于关联分析的铁路旅客同行预测方法
Railway Passenger Co-travel Prediction Based on Association Analysis
计算机科学, 2021, 48(9): 95-102. https://doi.org/10.11896/jsjkx.200700097
[6] 屈立成, 吕娇, 屈艺华, 王海飞.
基于模糊神经网络的运动目标智能分配定位算法
Intelligent Assignment and Positioning Algorithm of Moving Target Based on Fuzzy Neural Network
计算机科学, 2021, 48(8): 246-252. https://doi.org/10.11896/jsjkx.200600050
[7] 孙林, 平国楼, 叶晓俊.
基于本地化差分隐私的键值数据关联分析
Correlation Analysis for Key-Value Data with Local Differential Privacy
计算机科学, 2021, 48(8): 278-283. https://doi.org/10.11896/jsjkx.201200122
[8] 姚娟, 邢镔, 曾骏, 文俊浩.
云制造服务组合研究综述
Survey on Cloud Manufacturing Service Composition
计算机科学, 2021, 48(7): 245-255. https://doi.org/10.11896/jsjkx.200800173
[9] 杨林, 王永杰.
蚁群算法在动态网络持续性路径预测中的运用及仿真
Application and Simulation of Ant Colony Algorithm in Continuous Path Prediction of Dynamic Network
计算机科学, 2021, 48(6A): 485-490. https://doi.org/10.11896/jsjkx.200800132
[10] 章菊, 李学鋆.
基于莱维萤火虫算法的智能生产线调度问题研究
Research on Intelligent Production Line Scheduling Problem Based on LGSO Algorithm
计算机科学, 2021, 48(6A): 668-672. https://doi.org/10.11896/jsjkx.210300118
[11] 孙明玮, 司维超, 董琪.
基于多维度数据的网络服务质量的综合评估研究
Research on Comprehensive Evaluation of Network Quality of Service Based on Multidimensional Data
计算机科学, 2021, 48(6A): 246-249. https://doi.org/10.11896/jsjkx.200900131
[12] 张蔷, 黄樟灿, 谈庆, 李华峰, 湛航.
基于动态近邻套索算子的金字塔演化策略
Pyramid Evolution Strategy Based on Dynamic Neighbor Lasso
计算机科学, 2021, 48(6): 215-221. https://doi.org/10.11896/jsjkx.200400115
[13] 刘奇, 陈红梅, 罗川.
基于改进的蝗虫优化算法的红细胞供应预测方法
Method for Prediction of Red Blood Cells Supply Based on Improved Grasshopper Optimization Algorithm
计算机科学, 2021, 48(2): 224-230. https://doi.org/10.11896/jsjkx.200600016
[14] 刘华玲, 皮常鹏, 刘梦瑶, 汤新.
一种新的优化机制:Rain
New Optimization Mechanism:Rain
计算机科学, 2021, 48(11A): 63-70. https://doi.org/10.11896/jsjkx.201100032
[15] 魏昕, 冯锋.
基于高斯-柯西变异的帝国竞争算法优化
Optimization of Empire Competition Algorithm Based on Gauss-Cauchy Mutation
计算机科学, 2021, 48(11A): 142-146. https://doi.org/10.11896/jsjkx.201200071
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!