计算机科学 ›› 2019, Vol. 46 ›› Issue (6): 21-28.doi: 10.11896/j.issn.1002-137X.2019.06.002

• 大数据与数据科学* • 上一篇    下一篇

DNA数据存储技术研究进展

张淑芳1, 彭康1, 宋香明1, 张子昱2, 王汉杰3   

  1. (天津大学电气自动化与信息工程学院 天津300072)1
    (天津大学国际工程师学院 天津300072)2
    (天津大学生命科学学院 天津300072)3
  • 收稿日期:2018-12-23 发布日期:2019-06-24
  • 通讯作者: 张淑芳(1979-),女,博士,副教授,主要研究方向为数字图像处理及视频编码,E-mail:shufangzhang@tju.edu.cn
  • 作者简介:彭 康(1995-),女,硕士,主要研究方向为生物医学图像处理;宋香明(1993-),男,硕士,主要研究方向为生物医学图像处理;张子昱(1995-),男,硕士,主要研究方向为生物医学图像处理;王汉杰(1984-),男,博士,副教授,主要研究方向为纳米生物医学工程。

Research Progress on DNA Data Storage Technology

ZHANG Shu-fang1, PENG Kang1, SONG Xiang-ming1, ZHANG Zi-yu2, WANG Han-jie3   

  1. (School of Electrical and Information Engineering,Tianjin University,Tianjin 300072,China)1
    (Tianjin International Engineering Institute,Tianjin 300072,China)2
    (School of Life Sciences,Tianjin University,Tianjin 300072,China)3
  • Received:2018-12-23 Published:2019-06-24

摘要: 随着计算机技术和网络技术的飞速发展,由此产生的海量数据给传统数据存储方式带来了巨大挑战,因此研究人员开始致力于寻找新一代存储方案。脱氧核糖核酸(Deoxyribonucleic Acid,DNA)作为天然的遗传信息存储介质,具有存储容量大、能耗低和寿命长等优点,有效克服了传统硬盘和计算机存储等方式的不足,故DNA数据存储技术成为信息技术和生物技术交叉领域的研究热点。文中综述了DNA数据存储技术的研究进展,首先对DNA及其存储的理论框架进行了介绍;然后详细阐述了DNA数据存储中的编码技术:二进制数据的压缩编码算法、纠错算法以及二进制数据到DNA 4种碱基的转换方法;最后对现阶段已有的DNA存储方案进行了分析,并对DNA数据存储研究存在的挑战进行了讨论。

关键词: DNA, 存储密度, 纠错算法, 数据存储, 压缩编码

Abstract: With the rapid development of computer technology and network technology,the massive generated data have brought great challenges to traditional data storage methods,so researchers begin to focus on finding a new generation of storage scheme.As a natural genetic information storage medium,Deoxyribonucleic acid (DNA) has advantages of large storage capacity,low energy consumption and long life,which effectively overcome the shortcomings of traditional storage methods,such as hard disk and computer storage.The DNA data storage method has become a research hotspot in the intersection field of information and biotechnology.This paper reviewed the research progress on DNA data stora-ge technology.Firstly,DNA and its theoretical framework of storage are introduced.Then,the coding technologies in DNA data storage are elaborated,which includes compression coding algorithm of binary data,error correction algorithm and conversion method from binary data to four bases of DNA.Finally,the existing DNA storage schemes are ana-lyzed,and the challenges in DNA data storage research are discussed.

Key words: Compression coding, Data storage, DNA, Error correction algorithm, Storage density

中图分类号: 

  • TP301
[1]ZHIRNOV V,ZADEGAN R M,SANDHU G S,et al.Nucleic Acid Memory[J].Nature Materials,2016,15(4):366-370.
[2]GODA K,KITSUREGAWA M.The History of Storage Sys-tems[J].Proceedings of the IEEE,2012,100(13):1433-1440.
[3]PANDA D,MOLLA K A,BAIG M J,et al.DNA as a digital information storage device:hope or hype?[J].Biotech,2018,8(5):239-247.
[4]WILLIAMS E D,AYRES R U,HELLER M.The 1.7 Kilogram Microchip:Energy and Material Use in the Production of Semiconductor Devices[J].Environmental Science & Technology,2004,38(6):1915-1916.
[5]EXTANCE A.How DNA could store all the world’s data[J].Nature,2016,537(7618):22-24.
[6]BORNHOLT J,LOPEZ R,CARMEAN D M,et al.A DNA-Based Archival Storage System[C]∥International Conference on Architectural Support for Programming Languages & Ope-rating Systems.2016.
[7]HAKAMI H A,CHACZKO Z,KALE A.Review of Big Data Storage Based on DNA Computing[C]∥Computer Aided System Engineering.IEEE,2015.
[8]CHURCH G M,GAO Y,KOSURI S.Next-Generation Digital Information Storage in DNA[J].Science,2012,337(6102):1628.
[9]SHIPMAN S L,NIVALA J,MACKLIS J D,et al.CRISPR-Cas encoding of a digital movie into the genomes of a population of living bacteria[J].Nature,2017,547(7663):345-349.
[10]ERLICH Y,ZIELINSKI D.DNA Fountain enables a robust and efficient storage architecture[J].Science,2017,355(6328):950-954.
[11]GOLDMAN N,BERTONE P,CHEN S,et al.Towards practical,high-capacity,low-maintenance information storage in synthesized DNA[J].Nature,2013,494(7435):77-80.
[12]BORNHOLT J,LOPEZ R,CARMEAN D M,et al.Toward a DNA-Based Archival Storage System[J].IEEE Micro,2016,PP(99):637-649.
[13]CASTILLO M.From Hard Drives to Flash Drives to DNA Drives[J].American Journal of Neuroradiology,2014,35(1):1-2.
[14]BONNET J,COLOTTE M,COUDY D,et al.Chain and conformation stability of solid-state DNA:Implications of room temperature storage[J].Nucleic Acids Research 2009;38(5):1531-1546.
[15]WANG S W.DNA storage with error correction mechanism [D].Changsha:National University of Defense Technology,2014.(in Chinese)
王诗薇.带纠错机制的DNA存储[D].长沙:国防科学技术大学,2014.
[16]ORLANDO L,GLNOLHAC A,ZHANG G,et al.Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse[J].Nature,2013,499(7456):74-78.
[17]MILLER W,SCHUSTER S C,WELCH A J,et al.Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change[J].Proceedings of The National Academy of Sciences of The United States of America,2012,109(36):E2382-E2390.
[18]KIM S,SOLTIS D E,SOLTIS P S,et al.DNA sequences from Miocene fossils:an ndhF sequence of Magnolia latahensis(Magnoliaceae) and an rbcL sequence of Persea psedocarolinensis(Lauraceae)[J].American Journal of Botany,2004,91(4):615-620.
[19]ALLENTOFT M E,COLLINS M,HARKER D,et al.The half-life of DNA in bone:measuring decay kinetics in 158 dated fossils[J].Proceedings Biological Sciences,2012,279(1748):4724-4733.
[20]HUFFMAN D A.A method for the construction of minimum-redundancy codes[J].Resonance,2006,11(2):91-99.
[21]GIBSON D G,VENTER J C.Creation of a bacterial cell con-trolled by a chemically synthesized genome.[J].Science,2010,329(5987):52-56.
[22]SAVITRI P A I,ADIWIJAYA,MURDIANSYAH D T,et al.Digital medical image compression algorithm using adaptive Huffman coding and graph based quantization based on IWT-SVD[C]∥International Conference on Information & Communication Technology.2016.
[23]PADMAVATI S,MESHARAM V.DCT combined with fractal quadtree decomposition and Huffman coding for image compression[C]∥International Conference on Condition Assessment Techniques in Electrical Systems.2016.
[24]AILENBERG M,ROTSTEIN O.An improved Huffman coding method for archiving text,images,and music characters in DNA[J].BioTechniques,2009,47(3):747-754.
[25]CAIRE G,SHAMAI S,SHOKROLLAHI A,et al.Universal variable-length data compression of binary sources using fountain codes[C]∥Information Theory Workshop.IEEE,2016:123-128.
[26]BLAWAT M,GAEDKE K,HüTTER I,et al.Forward Error Correction for DNA Data Storage [J].Procedia Computer Science,2016,80(5):1011-1022.
[27]BYERS J W,LUBY M,MITZENMACHER M.A digital fountain approach to asynchronous reliable multicast[J].IEEE Journal on Selected Areas in Communications,2002,20(8):1528-1540.
[28]BING L I,ZHANG L,LIU Y.FPGA hardware implementation of the LZMA compression algorithm[J].Journal of Beijing University of Aeronautics & Astronautics,2015,41(3):375-382.
[29]LAN C,XU J,ZENG W,et al.Compound image compression using lossless and lossy LZMA in HEVC[C]∥IEEE Internatio-nal Conference on Multimedia & Expo.IEEE,2015.
[30]JI Z,ZHOU J R,ZHU Z X.Bioinformatics Features Based DNA Sequence DATA Compression Algorithm[J].Acta Electronica Sinica,2011,39(5):991-995.(in Chinese)
纪震,周家锐,朱泽轩.基于生物信息学特征的DNA序列数据压缩算法[J].电子学报,2011,39(5):991-995.
[31]PINHO A J,PRATAS D,FERREIRA P J S G.Bacteria DNA sequence compression using a mixture of finite-context models[C]∥Statistical Signal Processing Workshop.2011.
[32]CAO M D,DIX T I,ALLISON L,et al.A Simple Statistical Algorithm for Biological Sequence Compression[C]∥Data Compression Conference.2007.
[33]YIM K Y,YU C S,LI J W,et al.The Essential Component in DNA-Based Information Storage System:Robust Error-Tolerating Module[J].Frontiers in Bioengineering & Biotechnology,2014,2(2):49-53.
[34]SALOMON D.Data compression:the complete reference[M]∥Data Compression:The Complete Reference.New York:Sprin-ger-Verlag,2000.
[35]SHEN Y F,PAN L.Principle and Method of the Error Detection and Correction of Ternary Hamming Codes [J].Chinese Journal of Computers,2015,38(8):1648-1655.(in Chinese)
沈云付,潘磊.三值汉明码检错纠错原理和方法[J].计算机学报,2015,38(8):1648-1655.
[36]SINGH A K.Error detection and correction by hamming code[C]∥International Conference on Global Trends in Signal Processing.2017.
[37]SONG X M.Research on DNA Information Storage Method Based on Huffman Coding [D].Tianjin:Tianjin University,2018.(in Chinese)
宋香明.基于Huffman编码的DNA信息存储方法研究[D].天津:天津大学,2018.
[38]GRASS R N,HECKEL R,PUDDU M,et al.Robust chemical preservation of digital information on DNA in silica with error-correcting codes[J].Angewandte Chemie International Ed in English,2015,54(8):2552-2555.
[39]WANG L,WEI Z,YANG W,et al.Multiple channel error-correction algorithms for LCC decoding of Reed-Solomon codes and its high-speed architecture design[J].IET Communications,2017,11(9):1407-1415.
[40]ALWAN M H,SINGH M,MAHDI H F.Performance comparison of turbo codes with LDPC codes and with BCH codes for forward error correcting codes[C]∥Research & Development.2016.
[41]BALDI M,MATURO N,RICCIUTELLI G,et al.On the error detection capability of combined LDPC and CRC codes for space telecommand transmissions[C]∥Computers & Communication.2016.
[42]CHRISTY,BOGARD,ERIC,et al.DNA media storage[J].Progress in Natural Science,2008,18(5):603-609.
[43]WONG P C,WONG K K,FOOTE H.Organic data memory using the DNA approach[J].Communications of the ACM,2003,46(1):95-98.
[44]KASHIWAMURA S,YAMAMOTO M,KAMEDA A,et al. Potential for enlarging DNA memory:the validity of experimental operations of scaled-up nested primer molecular memory[J].Biosystems,2005,80(1):99-112.
[45]NOZOMU Y,KAZUHIDE S,JUNICHI S,et al.Alignment-Based Approach for Durable Data Storage into Living Organisms[J].Biotechnology Progress,2007,23(2):501-505.
[46]SCHUSTER S C.Next-generation sequencing transforms today’s biology[J].Nature Methods,2008,5(1):16-18.
[47]SONG Y,KIM S,HELLER M J,et al.DNA multi-bit non-volatile memory and bit-shifting operations using addressable electrode arrays and electric field-induced hybridization[J].Nature Communications,2018,9(1):1-8.
[48]HEAVEN D.Now we can store video in living DNA[J].New Scientist,2017,235(3134):11-14.
[49]YAZDI S M H T,GABRYS R,MILENKOVIC O.Portable and Error-Free DNA-Based Data Storage[J].Scientific Reports,2017,7(1):1-4.
[50]BLAWAT M,GAEDKE K,HÜTTER I,et al.Forward Error Correction for DNA Data Storage [J].Procedia Computer Science,2016,80(5):1011-1022.
[51]JIANG L,QIU W,ALDIRINI F,et al.Feasibility study of molecular memory device based on DNA using methylation to store information[J].Journal of Applied Physics,2016,120(2):96-100.
[52]YAZDI S M H T,KIAH H M,GARCIA E R,et al.DNA-Based Storage:Trends and Methods[J].IEEE Transactions on Mole-cular,Biological and Multi-Scale Communications,2015,1(3):230-248.
[53]TABATABAEI Y S M H,YUAN Y,MA J,et al.A Rewritable,Random-Access DNA-Based Storage System[J].Scientific Reports,2015,5(9):1-10.
[54]FATIMA A,UL H I,HAIDER A,et al.Trends to store digital data in DNA:an overview[J].Molecular Biology Reports,2018,45(5):1479-1490.
[55]JAIN S,FARNOUD F,SCHWARTZ M,et al.Duplication-correcting codes for data storage in the DNA of living organisms[J].IEEE Transactions on Information Theory,2016,PP(99):1.
[56]张淑芳,宋香明.一种高存储密度的 DNA 信息存储编码方案:201811445344.8[P].2018.
[57]SILVA P Y D,GANEGODA G U.New Trends of Digital Data Storage in DNA[J].Biomed Research International,2016,2016(5536):1-14.
[58]CARR P A,CHURCH G M.Genome engineering[J].Nature Biotechnology,2009,27(12):1151-1162.
[59]SHENDURE J,LIEBERMAN A E.The expanding scope of DNA sequencing[J].Nature Biotechnology,2012,30(11):1084 [60]PENNISI E.Genome sequencing.Search for pore-fection[J]. Science,2012,336(6081):534-537.
[61]KOSURI S,CHURCH G M.Large-scale de novo DNA synthesis:technologies and applications[J].Nature Methods,2014,11(5):499-507.
[1] 王坤姝, 张泽辉, 高铁杠.
基于Hachimoji DNA和QR分解的遥感图像可逆隐藏算法
Reversible Hidden Algorithm for Remote Sensing Images Based on Hachimoji DNA and QR Decomposition
计算机科学, 2022, 49(8): 127-135. https://doi.org/10.11896/jsjkx.210700216
[2] 孙福权, 梁莹.
基于XGBoost算法的水稻基因组6mA位点识别研究
Identification of 6mA Sites in Rice Genome Based on XGBoost Algorithm
计算机科学, 2022, 49(6A): 309-313. https://doi.org/10.11896/jsjkx.210700262
[3] 陈伟, 李杭, 李维华.
核小体定位预测的集成学习方法
Ensemble Learning Method for Nucleosome Localization Prediction
计算机科学, 2022, 49(2): 285-291. https://doi.org/10.11896/jsjkx.201100195
[4] 吴立波, 黄玉芳.
基于DNA链置换的逻辑推理问题研究
Logical Reasoning Based on DNA Strand Displacement
计算机科学, 2022, 49(1): 259-263. https://doi.org/10.11896/jsjkx.210200131
[5] 杜流云, 郑智捷, 郑华仙.
厚壁菌门下两类细菌的DNA全序列可视化研究
Visualization of DNA Sequences of Two Kinds of Bacteria Under Firmicutes
计算机科学, 2020, 47(11A): 192-195. https://doi.org/10.11896/jsjkx.191200070
[6] 汪洋, 李鹏, 季一木, 樊卫北, 张玉杰, 王汝传, 陈国良.
高性能计算与天文大数据研究综述
High Performance Computing and Astronomical Data:A Survey
计算机科学, 2020, 47(1): 1-6. https://doi.org/10.11896/jsjkx.190900042
[7] 韩英杰, 周清雷, 朱维军.
基于DNA计算的计算树逻辑模型检测方法研究进展
Survey on DNA-computing Based Methods of Computation Tree Logic Model Checking
计算机科学, 2019, 46(11): 25-31. https://doi.org/10.11896/jsjkx.181102091
[8] 宫法明,李翛然.
基于Neo4j的海量石油领域本体数据存储研究
Research on Ontology Data Storage of Massive Oil Field Based on Neo4j
计算机科学, 2018, 45(6A): 549-554.
[9] 贾鑫,张少平.
基于贪婪策略的NAND FLASH存储器的磨损均衡算法研究
Research on Wear Leveling Algorithm of NAND FLASH Memory Based on Greedy Strategy
计算机科学, 2017, 44(Z11): 312-316. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.066
[10] 郭梅,袁栋,杨耘.
云计算环境下低成本存储科学数据的演化CTT-SP算法
Evolutionary CTT-SP Algorithm for Cost-effectively Storing Scientific Datasets in Cloud
计算机科学, 2017, 44(12): 163-168. https://doi.org/10.11896/j.issn.1002-137X.2017.12.031
[11] 布海力切木·阿吾冬,李国东.
基于DNA编码与正弦混沌映射的气象图加密技术研究
Study on DNA Encoding & Sine Chaos-based Meteorological Image Encryption Technology
计算机科学, 2016, 43(Z11): 403-406. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.092
[12] 黄冬梅,赵丹枫,魏立斐,杜艳玲,王振华.
大数据背景下海洋数据管理的挑战与对策
Managing Marine Data as Big Data:Uprising Challenges and Tentative Solutions
计算机科学, 2016, 43(6): 17-23. https://doi.org/10.11896/j.issn.1002-137X.2016.06.003
[13] 冉娟,李晓宇.
基于秘密共享协议的移动数据存储研究
Mobile Data Storage Solution Based on Secret Sharing Protocol
计算机科学, 2016, 43(4): 145-149. https://doi.org/10.11896/j.issn.1002-137X.2016.04.029
[14] 邓晓军,欧阳旻,李玉龙.
基于虚拟环的无线传感网节能数据存储与查询机制
Energy-efficient Wireless Sensor Network Data Storage and Query Mechanism Based on Virtual Ring
计算机科学, 2015, 42(8): 132-135.
[15] 闵 林,樊卫北,郭拯危,凡高娟.
基于RCFile的无线传感器网络数据存储策略研究
Wireless Sensor Networks Data Storage Strategy Based on RCFile
计算机科学, 2015, 42(4): 76-80. https://doi.org/10.11896/j.issn.1002-137X.2015.04.014
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!