计算机科学 ›› 2020, Vol. 47 ›› Issue (6A): 70-74.doi: 10.11896/JsJkx.190900065

• 人工智能 • 上一篇    下一篇

使用深层卷积神经网络提高Hi-C 数据分辨率

程哲, 白茜, 张浩, 王世普, 梁宇   

  1. 云南大学软件学院 昆明 650000
  • 发布日期:2020-07-07
  • 通讯作者: 梁宇(yuliang@ynu.edu.cn)
  • 作者简介:chengzhe_xg@foxmail.com
  • 基金资助:
    国家自然科学基金(61762089,91631305,61863036)

Improving Hi-C Data Resolution with Deep Convolutional Neural Networks

CHENG Zhe, BAI Qian, ZHANG Hao, WANG Shi-pu and LIANG Yu   

  1. School of Software,Yunnan University,Kunming 650000,China
  • Published:2020-07-07
  • About author:CHENG Zhe, born in 1994, postgra-duate.His main research interests include deep learning, computer vision and bioinformatics.

摘要: Hi-C技术是一种测量整个基因组中所有成对交互的频率的技术,已成为研究基因组3D结构最流行的工具之一。通常情况下,基于Hi-C数据的研究需要测序大量的染色体数据,而测序深度较低的Hi-C数据虽然成本较低,但不足以提供充足的生物学信息给后续研究。由于Hi-C数据包含了类似的子模式,且一定区域内具有数据连续性,因此可以被预测。文中探究了基于卷积神经网络模型的改进方法,该模型以更大的范围预测核心的Hi-C数值,并扩展卷积神经网络的深度和感受野,通过1/16的原始测序读数,预测出Hi-C数据的原始测序读数。实验结果以皮尔森相关系数和斯皮尔曼相关系数衡量,并使用Fit-Hi-C分析明显的相互作用对,以及通过调用ChromHMM标记的染色质状态区域进行染色质状态分析。实验结果表明,预测结果不仅在数值分布规律上接近,而且在位点互作信息和染色质状态等方面也比低分辨率Hi-C数据更加可靠。

关键词: Hi-C技术, 超分辨率, 卷积神经网络, 深度学习, 生物信息学

Abstract: Hi-C technology measures the frequency of all paired-interaction in the entire genome.It has become one of the most popular tools for studying the 3D structure of genomes.In general,Hi-C data-based studies require sequencing of a large number of Chromosome data,while Hi-C data with lower sequencing depth,although less expensive,is not sufficient to provide sufficient biological information for subsequent studies.Since the Hi-C data contains similar sub-patterns and has data continuity within a certain area,it can be predicted.This paper explored an improved method based on convolutional neural network model.It predicts the core Hi-C values in a larger range and extends the depth and receptive field of the convolutional neural network,predicts the original sequencing reading of Hi-C by 1/16 of the original sequencing readings.The experimental results were measured by the Pearson correlation coefficient and the Spearman correlation coefficient,and the apparent interaction pairs were analyzed using Fit-Hi-C,and the state analyses of 12 chrom HMM-marked chromatin with ChromHMM were called.The experimental results show that the prediction results are not only close to the numerical distribution,but also more reliable than the low-resolution Hi-C data in terms of site interaction information and chromatin state.

Key words: Bioinformatics, Convolutional neural network, Deep learning, Hi-C technology, Super-resolution

中图分类号: 

  • TP391
[1] LIEBERMAN-AIDEN E,VAN BERKUM N L,LOUISE V,et al.Comprehensive mapping of long-range interactions reveals folding principles of the human genome.Science,2009,326:289-293.
[2] HU M,DENG K,QIN Z H,et al.Bayesian inference of spatial organizations of chromosomes.PLoS computational biology,2013,9(1):e1002893.
[3] VAROQUAUX N,FERHAT A,STAFFORD N W,et al.A statistical approach for inferring the 3D structure of the genome.Bioinformatics,2014,30(12):i26-i31.
[4] SCHMITT A D,HU M,JUNG I,et al.A compendium of chromatin contact maps reveals spatially active regions in the human genome.Cell Rep.,2016,17:2042-2059.
[5] RAO S S,HUNTLEY M H,DURAND N C,et al.A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping.Cell,2014,159(7):1665-1680.
[6] DIXON J R,SIDDARTH S,YUE F,et al.Topological domains in mammalian genomes identified by analysis of chromatin interactions.Nature,2012,485(7398):376-380.
[7] HAYAT K.Multimedia super-resolution via deep learning:Asurvey.Digital Signal Processing,2018.
[8] WANG Y H,LIU T,XU D,et al.Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks.Scientific Reports,2016,6:19598.
[9] DUCHON C E.Lanczos Filtering in One and Two Dimensions.Journal of Applied Meteorology,1979,18(8):1016-1022.
[10] FREEMAN W T,PASZTOR E C,OWEN T,et al.Learning Low-Level Vision.International Journal of Computer Vision,2000,40:2000.
[11] FREEMAN W T,JONES T R,PASZTOR E C.Example-based superresolution.Computer Graphics and Applications,2002,22(2):56-65.
[12] SCHULTER S,LEISTNER C,BLSCHOF H.Fast and accurate image upscaling with super-resolution forests//IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2015:3791-3799.
[13] DAI D,TIMOFTE R,VAN GOOL L.Jointly optimized regressors for image super-resolution//Eurographics.2015:8.
[14] DONG C,LOY C C,HE K,et al.Learning a Deep Convolutional Network for Image Super-Resolution.Cham:Springer International Publishing,2014:184-199.
[15] ZHANG Y,AN L,XU J,et al.Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus.Nature Communications,2018,9(1):750.
[16] AY F,BAILEY T L,NOBLE W S.Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts.Genome Res.,2014,24:999-1011.
[17] ERNST J,KELLIS M.ChromHMM:automating chromatinstate discovery and characterization.Nat.Methods,2012,9:215-216.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[3] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[4] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[5] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[6] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[7] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[8] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[9] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[10] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[11] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[12] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[13] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[14] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[15] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!