计算机科学 ›› 2020, Vol. 47 ›› Issue (6A): 70-74.doi: 10.11896/JsJkx.190900065

• 人工智能 • 上一篇    下一篇

使用深层卷积神经网络提高Hi-C 数据分辨率

程哲, 白茜, 张浩, 王世普, 梁宇   

  1. 云南大学软件学院 昆明 650000
  • 发布日期:2020-07-07
  • 通讯作者: 梁宇(yuliang@ynu.edu.cn)
  • 作者简介:chengzhe_xg@foxmail.com
  • 基金资助:
    国家自然科学基金(61762089,91631305,61863036)

Improving Hi-C Data Resolution with Deep Convolutional Neural Networks

CHENG Zhe, BAI Qian, ZHANG Hao, WANG Shi-pu and LIANG Yu   

  1. School of Software,Yunnan University,Kunming 650000,China
  • Published:2020-07-07
  • About author:CHENG Zhe, born in 1994, postgra-duate.His main research interests include deep learning, computer vision and bioinformatics.

摘要: Hi-C技术是一种测量整个基因组中所有成对交互的频率的技术,已成为研究基因组3D结构最流行的工具之一。通常情况下,基于Hi-C数据的研究需要测序大量的染色体数据,而测序深度较低的Hi-C数据虽然成本较低,但不足以提供充足的生物学信息给后续研究。由于Hi-C数据包含了类似的子模式,且一定区域内具有数据连续性,因此可以被预测。文中探究了基于卷积神经网络模型的改进方法,该模型以更大的范围预测核心的Hi-C数值,并扩展卷积神经网络的深度和感受野,通过1/16的原始测序读数,预测出Hi-C数据的原始测序读数。实验结果以皮尔森相关系数和斯皮尔曼相关系数衡量,并使用Fit-Hi-C分析明显的相互作用对,以及通过调用ChromHMM标记的染色质状态区域进行染色质状态分析。实验结果表明,预测结果不仅在数值分布规律上接近,而且在位点互作信息和染色质状态等方面也比低分辨率Hi-C数据更加可靠。

关键词: Hi-C技术, 超分辨率, 卷积神经网络, 生物信息学, 深度学习

Abstract: Hi-C technology measures the frequency of all paired-interaction in the entire genome.It has become one of the most popular tools for studying the 3D structure of genomes.In general,Hi-C data-based studies require sequencing of a large number of Chromosome data,while Hi-C data with lower sequencing depth,although less expensive,is not sufficient to provide sufficient biological information for subsequent studies.Since the Hi-C data contains similar sub-patterns and has data continuity within a certain area,it can be predicted.This paper explored an improved method based on convolutional neural network model.It predicts the core Hi-C values in a larger range and extends the depth and receptive field of the convolutional neural network,predicts the original sequencing reading of Hi-C by 1/16 of the original sequencing readings.The experimental results were measured by the Pearson correlation coefficient and the Spearman correlation coefficient,and the apparent interaction pairs were analyzed using Fit-Hi-C,and the state analyses of 12 chrom HMM-marked chromatin with ChromHMM were called.The experimental results show that the prediction results are not only close to the numerical distribution,but also more reliable than the low-resolution Hi-C data in terms of site interaction information and chromatin state.

Key words: Hi-C technology, Super-resolution, Convolutional neural network, Bioinformatics, Deep learning

中图分类号: 

  • TP391
[1] LIEBERMAN-AIDEN E,VAN BERKUM N L,LOUISE V,et al.Comprehensive mapping of long-range interactions reveals folding principles of the human genome.Science,2009,326:289-293.
[2] HU M,DENG K,QIN Z H,et al.Bayesian inference of spatial organizations of chromosomes.PLoS computational biology,2013,9(1):e1002893.
[3] VAROQUAUX N,FERHAT A,STAFFORD N W,et al.A statistical approach for inferring the 3D structure of the genome.Bioinformatics,2014,30(12):i26-i31.
[4] SCHMITT A D,HU M,JUNG I,et al.A compendium of chromatin contact maps reveals spatially active regions in the human genome.Cell Rep.,2016,17:2042-2059.
[5] RAO S S,HUNTLEY M H,DURAND N C,et al.A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping.Cell,2014,159(7):1665-1680.
[6] DIXON J R,SIDDARTH S,YUE F,et al.Topological domains in mammalian genomes identified by analysis of chromatin interactions.Nature,2012,485(7398):376-380.
[7] HAYAT K.Multimedia super-resolution via deep learning:Asurvey.Digital Signal Processing,2018.
[8] WANG Y H,LIU T,XU D,et al.Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks.Scientific Reports,2016,6:19598.
[9] DUCHON C E.Lanczos Filtering in One and Two Dimensions.Journal of Applied Meteorology,1979,18(8):1016-1022.
[10] FREEMAN W T,PASZTOR E C,OWEN T,et al.Learning Low-Level Vision.International Journal of Computer Vision,2000,40:2000.
[11] FREEMAN W T,JONES T R,PASZTOR E C.Example-based superresolution.Computer Graphics and Applications,2002,22(2):56-65.
[12] SCHULTER S,LEISTNER C,BLSCHOF H.Fast and accurate image upscaling with super-resolution forests//IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2015:3791-3799.
[13] DAI D,TIMOFTE R,VAN GOOL L.Jointly optimized regressors for image super-resolution//Eurographics.2015:8.
[14] DONG C,LOY C C,HE K,et al.Learning a Deep Convolutional Network for Image Super-Resolution.Cham:Springer International Publishing,2014:184-199.
[15] ZHANG Y,AN L,XU J,et al.Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus.Nature Communications,2018,9(1):750.
[16] AY F,BAILEY T L,NOBLE W S.Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts.Genome Res.,2014,24:999-1011.
[17] ERNST J,KELLIS M.ChromHMM:automating chromatinstate discovery and characterization.Nat.Methods,2012,9:215-216.
[1] 单美静, 秦龙飞, 张会兵. L-YOLO:适用于车载边缘计算的实时交通标识检测模型[J]. 计算机科学, 2021, 48(1): 89-95.
[2] 何彦辉, 吴桂兴, 吴志强. 基于域适应的X光图像的目标检测[J]. 计算机科学, 2021, 48(1): 175-181.
[3] 李亚男, 胡宇佳, 甘伟, 朱敏. 基于深度学习的miRNA靶位点预测研究综述[J]. 计算机科学, 2021, 48(1): 209-216.
[4] 王瑞平, 贾真, 刘畅, 陈泽威, 李天瑞. 基于DeepFM的深度兴趣因子分解机网络[J]. 计算机科学, 2021, 48(1): 226-232.
[5] 于文家, 丁世飞. 基于自注意力机制的条件生成对抗网络[J]. 计算机科学, 2021, 48(1): 241-246.
[6] 仝鑫, 王斌君, 王润正, 潘孝勤. 面向自然语言处理的深度学习对抗样本综述[J]. 计算机科学, 2021, 48(1): 258-267.
[7] 丁钰, 魏浩, 潘志松, 刘鑫. 网络表示学习算法综述[J]. 计算机科学, 2020, 47(9): 52-59.
[8] 何鑫, 许娟, 金莹莹. 行为关联网络:完整的变化行为建模[J]. 计算机科学, 2020, 47(9): 123-128.
[9] 张佳嘉, 张小洪. 多分支卷积神经网络肺结节分类方法及其可解释性[J]. 计算机科学, 2020, 47(9): 129-134.
[10] 田旭, 常侃, 黄升, 覃团发. 基于残差字典及协作表达的单图像超分辨率算法[J]. 计算机科学, 2020, 47(9): 135-141.
[11] 叶亚男, 迟静, 于志平, 战玉丽, 张彩明. 基于改进CycleGan模型和区域分割的表情动画合成[J]. 计算机科学, 2020, 47(9): 142-149.
[12] 朱玲莹, 桑庆兵, 顾婷婷. 基于视差信息的无参考立体图像质量评价[J]. 计算机科学, 2020, 47(9): 150-156.
[13] 邓良, 许庚林, 李梦杰, 陈章进. 基于深度学习与多哈希相似度加权实现快速人脸识别[J]. 计算机科学, 2020, 47(9): 163-168.
[14] 崔彤彤, 王桂玲, 高晶. 基于1DCNN-LSTM的船舶轨迹分类方法[J]. 计算机科学, 2020, 47(9): 175-184.
[15] 刘海潮, 王莉. 基于深度图卷积胶囊网络的图分类模型[J]. 计算机科学, 2020, 47(9): 219-225.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .