Computer Science ›› 2019, Vol. 46 ›› Issue (6): 21-28.doi: 10.11896/j.issn.1002-137X.2019.06.002

Previous Articles     Next Articles

Research Progress on DNA Data Storage Technology

ZHANG Shu-fang1, PENG Kang1, SONG Xiang-ming1, ZHANG Zi-yu2, WANG Han-jie3   

  1. (School of Electrical and Information Engineering,Tianjin University,Tianjin 300072,China)1
    (Tianjin International Engineering Institute,Tianjin 300072,China)2
    (School of Life Sciences,Tianjin University,Tianjin 300072,China)3
  • Received:2018-12-23 Published:2019-06-24

Abstract: With the rapid development of computer technology and network technology,the massive generated data have brought great challenges to traditional data storage methods,so researchers begin to focus on finding a new generation of storage scheme.As a natural genetic information storage medium,Deoxyribonucleic acid (DNA) has advantages of large storage capacity,low energy consumption and long life,which effectively overcome the shortcomings of traditional storage methods,such as hard disk and computer storage.The DNA data storage method has become a research hotspot in the intersection field of information and biotechnology.This paper reviewed the research progress on DNA data stora-ge technology.Firstly,DNA and its theoretical framework of storage are introduced.Then,the coding technologies in DNA data storage are elaborated,which includes compression coding algorithm of binary data,error correction algorithm and conversion method from binary data to four bases of DNA.Finally,the existing DNA storage schemes are ana-lyzed,and the challenges in DNA data storage research are discussed.

Key words: Compression coding, Data storage, DNA, Error correction algorithm, Storage density

CLC Number: 

  • TP301
[1]ZHIRNOV V,ZADEGAN R M,SANDHU G S,et al.Nucleic Acid Memory[J].Nature Materials,2016,15(4):366-370.
[2]GODA K,KITSUREGAWA M.The History of Storage Sys-tems[J].Proceedings of the IEEE,2012,100(13):1433-1440.
[3]PANDA D,MOLLA K A,BAIG M J,et al.DNA as a digital information storage device:hope or hype?[J].Biotech,2018,8(5):239-247.
[4]WILLIAMS E D,AYRES R U,HELLER M.The 1.7 Kilogram Microchip:Energy and Material Use in the Production of Semiconductor Devices[J].Environmental Science & Technology,2004,38(6):1915-1916.
[5]EXTANCE A.How DNA could store all the world’s data[J].Nature,2016,537(7618):22-24.
[6]BORNHOLT J,LOPEZ R,CARMEAN D M,et al.A DNA-Based Archival Storage System[C]∥International Conference on Architectural Support for Programming Languages & Ope-rating Systems.2016.
[7]HAKAMI H A,CHACZKO Z,KALE A.Review of Big Data Storage Based on DNA Computing[C]∥Computer Aided System Engineering.IEEE,2015.
[8]CHURCH G M,GAO Y,KOSURI S.Next-Generation Digital Information Storage in DNA[J].Science,2012,337(6102):1628.
[9]SHIPMAN S L,NIVALA J,MACKLIS J D,et al.CRISPR-Cas encoding of a digital movie into the genomes of a population of living bacteria[J].Nature,2017,547(7663):345-349.
[10]ERLICH Y,ZIELINSKI D.DNA Fountain enables a robust and efficient storage architecture[J].Science,2017,355(6328):950-954.
[11]GOLDMAN N,BERTONE P,CHEN S,et al.Towards practical,high-capacity,low-maintenance information storage in synthesized DNA[J].Nature,2013,494(7435):77-80.
[12]BORNHOLT J,LOPEZ R,CARMEAN D M,et al.Toward a DNA-Based Archival Storage System[J].IEEE Micro,2016,PP(99):637-649.
[13]CASTILLO M.From Hard Drives to Flash Drives to DNA Drives[J].American Journal of Neuroradiology,2014,35(1):1-2.
[14]BONNET J,COLOTTE M,COUDY D,et al.Chain and conformation stability of solid-state DNA:Implications of room temperature storage[J].Nucleic Acids Research 2009;38(5):1531-1546.
[15]WANG S W.DNA storage with error correction mechanism [D].Changsha:National University of Defense Technology,2014.(in Chinese)
王诗薇.带纠错机制的DNA存储[D].长沙:国防科学技术大学,2014.
[16]ORLANDO L,GLNOLHAC A,ZHANG G,et al.Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse[J].Nature,2013,499(7456):74-78.
[17]MILLER W,SCHUSTER S C,WELCH A J,et al.Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change[J].Proceedings of The National Academy of Sciences of The United States of America,2012,109(36):E2382-E2390.
[18]KIM S,SOLTIS D E,SOLTIS P S,et al.DNA sequences from Miocene fossils:an ndhF sequence of Magnolia latahensis(Magnoliaceae) and an rbcL sequence of Persea psedocarolinensis(Lauraceae)[J].American Journal of Botany,2004,91(4):615-620.
[19]ALLENTOFT M E,COLLINS M,HARKER D,et al.The half-life of DNA in bone:measuring decay kinetics in 158 dated fossils[J].Proceedings Biological Sciences,2012,279(1748):4724-4733.
[20]HUFFMAN D A.A method for the construction of minimum-redundancy codes[J].Resonance,2006,11(2):91-99.
[21]GIBSON D G,VENTER J C.Creation of a bacterial cell con-trolled by a chemically synthesized genome.[J].Science,2010,329(5987):52-56.
[22]SAVITRI P A I,ADIWIJAYA,MURDIANSYAH D T,et al.Digital medical image compression algorithm using adaptive Huffman coding and graph based quantization based on IWT-SVD[C]∥International Conference on Information & Communication Technology.2016.
[23]PADMAVATI S,MESHARAM V.DCT combined with fractal quadtree decomposition and Huffman coding for image compression[C]∥International Conference on Condition Assessment Techniques in Electrical Systems.2016.
[24]AILENBERG M,ROTSTEIN O.An improved Huffman coding method for archiving text,images,and music characters in DNA[J].BioTechniques,2009,47(3):747-754.
[25]CAIRE G,SHAMAI S,SHOKROLLAHI A,et al.Universal variable-length data compression of binary sources using fountain codes[C]∥Information Theory Workshop.IEEE,2016:123-128.
[26]BLAWAT M,GAEDKE K,HüTTER I,et al.Forward Error Correction for DNA Data Storage [J].Procedia Computer Science,2016,80(5):1011-1022.
[27]BYERS J W,LUBY M,MITZENMACHER M.A digital fountain approach to asynchronous reliable multicast[J].IEEE Journal on Selected Areas in Communications,2002,20(8):1528-1540.
[28]BING L I,ZHANG L,LIU Y.FPGA hardware implementation of the LZMA compression algorithm[J].Journal of Beijing University of Aeronautics & Astronautics,2015,41(3):375-382.
[29]LAN C,XU J,ZENG W,et al.Compound image compression using lossless and lossy LZMA in HEVC[C]∥IEEE Internatio-nal Conference on Multimedia & Expo.IEEE,2015.
[30]JI Z,ZHOU J R,ZHU Z X.Bioinformatics Features Based DNA Sequence DATA Compression Algorithm[J].Acta Electronica Sinica,2011,39(5):991-995.(in Chinese)
纪震,周家锐,朱泽轩.基于生物信息学特征的DNA序列数据压缩算法[J].电子学报,2011,39(5):991-995.
[31]PINHO A J,PRATAS D,FERREIRA P J S G.Bacteria DNA sequence compression using a mixture of finite-context models[C]∥Statistical Signal Processing Workshop.2011.
[32]CAO M D,DIX T I,ALLISON L,et al.A Simple Statistical Algorithm for Biological Sequence Compression[C]∥Data Compression Conference.2007.
[33]YIM K Y,YU C S,LI J W,et al.The Essential Component in DNA-Based Information Storage System:Robust Error-Tolerating Module[J].Frontiers in Bioengineering & Biotechnology,2014,2(2):49-53.
[34]SALOMON D.Data compression:the complete reference[M]∥Data Compression:The Complete Reference.New York:Sprin-ger-Verlag,2000.
[35]SHEN Y F,PAN L.Principle and Method of the Error Detection and Correction of Ternary Hamming Codes [J].Chinese Journal of Computers,2015,38(8):1648-1655.(in Chinese)
沈云付,潘磊.三值汉明码检错纠错原理和方法[J].计算机学报,2015,38(8):1648-1655.
[36]SINGH A K.Error detection and correction by hamming code[C]∥International Conference on Global Trends in Signal Processing.2017.
[37]SONG X M.Research on DNA Information Storage Method Based on Huffman Coding [D].Tianjin:Tianjin University,2018.(in Chinese)
宋香明.基于Huffman编码的DNA信息存储方法研究[D].天津:天津大学,2018.
[38]GRASS R N,HECKEL R,PUDDU M,et al.Robust chemical preservation of digital information on DNA in silica with error-correcting codes[J].Angewandte Chemie International Ed in English,2015,54(8):2552-2555.
[39]WANG L,WEI Z,YANG W,et al.Multiple channel error-correction algorithms for LCC decoding of Reed-Solomon codes and its high-speed architecture design[J].IET Communications,2017,11(9):1407-1415.
[40]ALWAN M H,SINGH M,MAHDI H F.Performance comparison of turbo codes with LDPC codes and with BCH codes for forward error correcting codes[C]∥Research & Development.2016.
[41]BALDI M,MATURO N,RICCIUTELLI G,et al.On the error detection capability of combined LDPC and CRC codes for space telecommand transmissions[C]∥Computers & Communication.2016.
[42]CHRISTY,BOGARD,ERIC,et al.DNA media storage[J].Progress in Natural Science,2008,18(5):603-609.
[43]WONG P C,WONG K K,FOOTE H.Organic data memory using the DNA approach[J].Communications of the ACM,2003,46(1):95-98.
[44]KASHIWAMURA S,YAMAMOTO M,KAMEDA A,et al. Potential for enlarging DNA memory:the validity of experimental operations of scaled-up nested primer molecular memory[J].Biosystems,2005,80(1):99-112.
[45]NOZOMU Y,KAZUHIDE S,JUNICHI S,et al.Alignment-Based Approach for Durable Data Storage into Living Organisms[J].Biotechnology Progress,2007,23(2):501-505.
[46]SCHUSTER S C.Next-generation sequencing transforms today’s biology[J].Nature Methods,2008,5(1):16-18.
[47]SONG Y,KIM S,HELLER M J,et al.DNA multi-bit non-volatile memory and bit-shifting operations using addressable electrode arrays and electric field-induced hybridization[J].Nature Communications,2018,9(1):1-8.
[48]HEAVEN D.Now we can store video in living DNA[J].New Scientist,2017,235(3134):11-14.
[49]YAZDI S M H T,GABRYS R,MILENKOVIC O.Portable and Error-Free DNA-Based Data Storage[J].Scientific Reports,2017,7(1):1-4.
[50]BLAWAT M,GAEDKE K,HÜTTER I,et al.Forward Error Correction for DNA Data Storage [J].Procedia Computer Science,2016,80(5):1011-1022.
[51]JIANG L,QIU W,ALDIRINI F,et al.Feasibility study of molecular memory device based on DNA using methylation to store information[J].Journal of Applied Physics,2016,120(2):96-100.
[52]YAZDI S M H T,KIAH H M,GARCIA E R,et al.DNA-Based Storage:Trends and Methods[J].IEEE Transactions on Mole-cular,Biological and Multi-Scale Communications,2015,1(3):230-248.
[53]TABATABAEI Y S M H,YUAN Y,MA J,et al.A Rewritable,Random-Access DNA-Based Storage System[J].Scientific Reports,2015,5(9):1-10.
[54]FATIMA A,UL H I,HAIDER A,et al.Trends to store digital data in DNA:an overview[J].Molecular Biology Reports,2018,45(5):1479-1490.
[55]JAIN S,FARNOUD F,SCHWARTZ M,et al.Duplication-correcting codes for data storage in the DNA of living organisms[J].IEEE Transactions on Information Theory,2016,PP(99):1.
[56]张淑芳,宋香明.一种高存储密度的 DNA 信息存储编码方案:201811445344.8[P].2018.
[57]SILVA P Y D,GANEGODA G U.New Trends of Digital Data Storage in DNA[J].Biomed Research International,2016,2016(5536):1-14.
[58]CARR P A,CHURCH G M.Genome engineering[J].Nature Biotechnology,2009,27(12):1151-1162.
[59]SHENDURE J,LIEBERMAN A E.The expanding scope of DNA sequencing[J].Nature Biotechnology,2012,30(11):1084 [60]PENNISI E.Genome sequencing.Search for pore-fection[J]. Science,2012,336(6081):534-537.
[61]KOSURI S,CHURCH G M.Large-scale de novo DNA synthesis:technologies and applications[J].Nature Methods,2014,11(5):499-507.
[1] SUN Fu-quan, LIANG Ying. Identification of 6mA Sites in Rice Genome Based on XGBoost Algorithm [J]. Computer Science, 2022, 49(6A): 309-313.
[2] CHEN Wei, LI Hang, LI Wei-hua. Ensemble Learning Method for Nucleosome Localization Prediction [J]. Computer Science, 2022, 49(2): 285-291.
[3] WU Li-bo, HUANG Yu-fang. Logical Reasoning Based on DNA Strand Displacement [J]. Computer Science, 2022, 49(1): 259-263.
[4] DU Liu-yun, ZHENG Zhi-jie, ZHENG Hua-xian. Visualization of DNA Sequences of Two Kinds of Bacteria Under Firmicutes [J]. Computer Science, 2020, 47(11A): 192-195.
[5] WANG Yang, LI Peng, JI Yi-mu, FAN Wei-bei, ZHANG Yu-jie, WANG Ru-chuan, CHEN Guo-liang. High Performance Computing and Astronomical Data:A Survey [J]. Computer Science, 2020, 47(1): 1-6.
[6] HAN Ying-jie, ZHOU Qing-lei, ZHU Wei-jun. Survey on DNA-computing Based Methods of Computation Tree Logic Model Checking [J]. Computer Science, 2019, 46(11): 25-31.
[7] ZHU Wei-jun, ZHANG Chun-yan, ZHOU Qing-lei, CHEN Yong-hua. DNA Sticker Algorithm for k-vertex Induced Sub-graphs of Directed Graphs [J]. Computer Science, 2019, 46(1): 309-313.
[8] GONG Fa-ming,LI Xiao-ran. Research on Ontology Data Storage of Massive Oil Field Based on Neo4j [J]. Computer Science, 2018, 45(6A): 549-554.
[9] JIA Xin and ZHANG Shao-ping. Research on Wear Leveling Algorithm of NAND FLASH Memory Based on Greedy Strategy [J]. Computer Science, 2017, 44(Z11): 312-316.
[10] Buhalqam AWDUN and LI Guo-dong. Study on DNA Encoding & Sine Chaos-based Meteorological Image Encryption Technology [J]. Computer Science, 2016, 43(Z11): 403-406.
[11] HUANG Dong-mei, ZHAO Dan-feng, WEI Li-fei, DU Yan-ling and WANG Zhen-hua. Managing Marine Data as Big Data:Uprising Challenges and Tentative Solutions [J]. Computer Science, 2016, 43(6): 17-23.
[12] RAN Juan and LI Xiao-yu. Mobile Data Storage Solution Based on Secret Sharing Protocol [J]. Computer Science, 2016, 43(4): 145-149.
[13] DENG Xiao-jun, OU-YANG Min and LI Yu-long. Energy-efficient Wireless Sensor Network Data Storage and Query Mechanism Based on Virtual Ring [J]. Computer Science, 2015, 42(8): 132-135.
[14] MIN Lin, FAN Wei-bei, GUO Zheng-wei and FAN Gao-juan. Wireless Sensor Networks Data Storage Strategy Based on RCFile [J]. Computer Science, 2015, 42(4): 76-80.
[15] KOU Guang-jie,MA Yun-yan,YUE Jun and ZOU Hai-lin. Survey of Bio-inspired Natural Computing [J]. Computer Science, 2014, 41(Z6): 37-41.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!