计算机科学 ›› 2019, Vol. 46 ›› Issue (6): 21-28.doi: 10.11896/j.issn.1002-137X.2019.06.002
张淑芳1, 彭康1, 宋香明1, 张子昱2, 王汉杰3
ZHANG Shu-fang1, PENG Kang1, SONG Xiang-ming1, ZHANG Zi-yu2, WANG Han-jie3
摘要: 随着计算机技术和网络技术的飞速发展,由此产生的海量数据给传统数据存储方式带来了巨大挑战,因此研究人员开始致力于寻找新一代存储方案。脱氧核糖核酸(Deoxyribonucleic Acid,DNA)作为天然的遗传信息存储介质,具有存储容量大、能耗低和寿命长等优点,有效克服了传统硬盘和计算机存储等方式的不足,故DNA数据存储技术成为信息技术和生物技术交叉领域的研究热点。文中综述了DNA数据存储技术的研究进展,首先对DNA及其存储的理论框架进行了介绍;然后详细阐述了DNA数据存储中的编码技术:二进制数据的压缩编码算法、纠错算法以及二进制数据到DNA 4种碱基的转换方法;最后对现阶段已有的DNA存储方案进行了分析,并对DNA数据存储研究存在的挑战进行了讨论。
中图分类号:
[1]ZHIRNOV V,ZADEGAN R M,SANDHU G S,et al.Nucleic Acid Memory[J].Nature Materials,2016,15(4):366-370. [2]GODA K,KITSUREGAWA M.The History of Storage Sys-tems[J].Proceedings of the IEEE,2012,100(13):1433-1440. [3]PANDA D,MOLLA K A,BAIG M J,et al.DNA as a digital information storage device:hope or hype?[J].Biotech,2018,8(5):239-247. [4]WILLIAMS E D,AYRES R U,HELLER M.The 1.7 Kilogram Microchip:Energy and Material Use in the Production of Semiconductor Devices[J].Environmental Science & Technology,2004,38(6):1915-1916. [5]EXTANCE A.How DNA could store all the world’s data[J].Nature,2016,537(7618):22-24. [6]BORNHOLT J,LOPEZ R,CARMEAN D M,et al.A DNA-Based Archival Storage System[C]∥International Conference on Architectural Support for Programming Languages & Ope-rating Systems.2016. [7]HAKAMI H A,CHACZKO Z,KALE A.Review of Big Data Storage Based on DNA Computing[C]∥Computer Aided System Engineering.IEEE,2015. [8]CHURCH G M,GAO Y,KOSURI S.Next-Generation Digital Information Storage in DNA[J].Science,2012,337(6102):1628. [9]SHIPMAN S L,NIVALA J,MACKLIS J D,et al.CRISPR-Cas encoding of a digital movie into the genomes of a population of living bacteria[J].Nature,2017,547(7663):345-349. [10]ERLICH Y,ZIELINSKI D.DNA Fountain enables a robust and efficient storage architecture[J].Science,2017,355(6328):950-954. [11]GOLDMAN N,BERTONE P,CHEN S,et al.Towards practical,high-capacity,low-maintenance information storage in synthesized DNA[J].Nature,2013,494(7435):77-80. [12]BORNHOLT J,LOPEZ R,CARMEAN D M,et al.Toward a DNA-Based Archival Storage System[J].IEEE Micro,2016,PP(99):637-649. [13]CASTILLO M.From Hard Drives to Flash Drives to DNA Drives[J].American Journal of Neuroradiology,2014,35(1):1-2. [14]BONNET J,COLOTTE M,COUDY D,et al.Chain and conformation stability of solid-state DNA:Implications of room temperature storage[J].Nucleic Acids Research 2009;38(5):1531-1546. [15]WANG S W.DNA storage with error correction mechanism [D].Changsha:National University of Defense Technology,2014.(in Chinese) 王诗薇.带纠错机制的DNA存储[D].长沙:国防科学技术大学,2014. [16]ORLANDO L,GLNOLHAC A,ZHANG G,et al.Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse[J].Nature,2013,499(7456):74-78. [17]MILLER W,SCHUSTER S C,WELCH A J,et al.Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change[J].Proceedings of The National Academy of Sciences of The United States of America,2012,109(36):E2382-E2390. [18]KIM S,SOLTIS D E,SOLTIS P S,et al.DNA sequences from Miocene fossils:an ndhF sequence of Magnolia latahensis(Magnoliaceae) and an rbcL sequence of Persea psedocarolinensis(Lauraceae)[J].American Journal of Botany,2004,91(4):615-620. [19]ALLENTOFT M E,COLLINS M,HARKER D,et al.The half-life of DNA in bone:measuring decay kinetics in 158 dated fossils[J].Proceedings Biological Sciences,2012,279(1748):4724-4733. [20]HUFFMAN D A.A method for the construction of minimum-redundancy codes[J].Resonance,2006,11(2):91-99. [21]GIBSON D G,VENTER J C.Creation of a bacterial cell con-trolled by a chemically synthesized genome.[J].Science,2010,329(5987):52-56. [22]SAVITRI P A I,ADIWIJAYA,MURDIANSYAH D T,et al.Digital medical image compression algorithm using adaptive Huffman coding and graph based quantization based on IWT-SVD[C]∥International Conference on Information & Communication Technology.2016. [23]PADMAVATI S,MESHARAM V.DCT combined with fractal quadtree decomposition and Huffman coding for image compression[C]∥International Conference on Condition Assessment Techniques in Electrical Systems.2016. [24]AILENBERG M,ROTSTEIN O.An improved Huffman coding method for archiving text,images,and music characters in DNA[J].BioTechniques,2009,47(3):747-754. [25]CAIRE G,SHAMAI S,SHOKROLLAHI A,et al.Universal variable-length data compression of binary sources using fountain codes[C]∥Information Theory Workshop.IEEE,2016:123-128. [26]BLAWAT M,GAEDKE K,HüTTER I,et al.Forward Error Correction for DNA Data Storage [J].Procedia Computer Science,2016,80(5):1011-1022. [27]BYERS J W,LUBY M,MITZENMACHER M.A digital fountain approach to asynchronous reliable multicast[J].IEEE Journal on Selected Areas in Communications,2002,20(8):1528-1540. [28]BING L I,ZHANG L,LIU Y.FPGA hardware implementation of the LZMA compression algorithm[J].Journal of Beijing University of Aeronautics & Astronautics,2015,41(3):375-382. [29]LAN C,XU J,ZENG W,et al.Compound image compression using lossless and lossy LZMA in HEVC[C]∥IEEE Internatio-nal Conference on Multimedia & Expo.IEEE,2015. [30]JI Z,ZHOU J R,ZHU Z X.Bioinformatics Features Based DNA Sequence DATA Compression Algorithm[J].Acta Electronica Sinica,2011,39(5):991-995.(in Chinese) 纪震,周家锐,朱泽轩.基于生物信息学特征的DNA序列数据压缩算法[J].电子学报,2011,39(5):991-995. [31]PINHO A J,PRATAS D,FERREIRA P J S G.Bacteria DNA sequence compression using a mixture of finite-context models[C]∥Statistical Signal Processing Workshop.2011. [32]CAO M D,DIX T I,ALLISON L,et al.A Simple Statistical Algorithm for Biological Sequence Compression[C]∥Data Compression Conference.2007. [33]YIM K Y,YU C S,LI J W,et al.The Essential Component in DNA-Based Information Storage System:Robust Error-Tolerating Module[J].Frontiers in Bioengineering & Biotechnology,2014,2(2):49-53. [34]SALOMON D.Data compression:the complete reference[M]∥Data Compression:The Complete Reference.New York:Sprin-ger-Verlag,2000. [35]SHEN Y F,PAN L.Principle and Method of the Error Detection and Correction of Ternary Hamming Codes [J].Chinese Journal of Computers,2015,38(8):1648-1655.(in Chinese) 沈云付,潘磊.三值汉明码检错纠错原理和方法[J].计算机学报,2015,38(8):1648-1655. [36]SINGH A K.Error detection and correction by hamming code[C]∥International Conference on Global Trends in Signal Processing.2017. [37]SONG X M.Research on DNA Information Storage Method Based on Huffman Coding [D].Tianjin:Tianjin University,2018.(in Chinese) 宋香明.基于Huffman编码的DNA信息存储方法研究[D].天津:天津大学,2018. [38]GRASS R N,HECKEL R,PUDDU M,et al.Robust chemical preservation of digital information on DNA in silica with error-correcting codes[J].Angewandte Chemie International Ed in English,2015,54(8):2552-2555. [39]WANG L,WEI Z,YANG W,et al.Multiple channel error-correction algorithms for LCC decoding of Reed-Solomon codes and its high-speed architecture design[J].IET Communications,2017,11(9):1407-1415. [40]ALWAN M H,SINGH M,MAHDI H F.Performance comparison of turbo codes with LDPC codes and with BCH codes for forward error correcting codes[C]∥Research & Development.2016. [41]BALDI M,MATURO N,RICCIUTELLI G,et al.On the error detection capability of combined LDPC and CRC codes for space telecommand transmissions[C]∥Computers & Communication.2016. [42]CHRISTY,BOGARD,ERIC,et al.DNA media storage[J].Progress in Natural Science,2008,18(5):603-609. [43]WONG P C,WONG K K,FOOTE H.Organic data memory using the DNA approach[J].Communications of the ACM,2003,46(1):95-98. [44]KASHIWAMURA S,YAMAMOTO M,KAMEDA A,et al. Potential for enlarging DNA memory:the validity of experimental operations of scaled-up nested primer molecular memory[J].Biosystems,2005,80(1):99-112. [45]NOZOMU Y,KAZUHIDE S,JUNICHI S,et al.Alignment-Based Approach for Durable Data Storage into Living Organisms[J].Biotechnology Progress,2007,23(2):501-505. [46]SCHUSTER S C.Next-generation sequencing transforms today’s biology[J].Nature Methods,2008,5(1):16-18. [47]SONG Y,KIM S,HELLER M J,et al.DNA multi-bit non-volatile memory and bit-shifting operations using addressable electrode arrays and electric field-induced hybridization[J].Nature Communications,2018,9(1):1-8. [48]HEAVEN D.Now we can store video in living DNA[J].New Scientist,2017,235(3134):11-14. [49]YAZDI S M H T,GABRYS R,MILENKOVIC O.Portable and Error-Free DNA-Based Data Storage[J].Scientific Reports,2017,7(1):1-4. [50]BLAWAT M,GAEDKE K,HÜTTER I,et al.Forward Error Correction for DNA Data Storage [J].Procedia Computer Science,2016,80(5):1011-1022. [51]JIANG L,QIU W,ALDIRINI F,et al.Feasibility study of molecular memory device based on DNA using methylation to store information[J].Journal of Applied Physics,2016,120(2):96-100. [52]YAZDI S M H T,KIAH H M,GARCIA E R,et al.DNA-Based Storage:Trends and Methods[J].IEEE Transactions on Mole-cular,Biological and Multi-Scale Communications,2015,1(3):230-248. [53]TABATABAEI Y S M H,YUAN Y,MA J,et al.A Rewritable,Random-Access DNA-Based Storage System[J].Scientific Reports,2015,5(9):1-10. [54]FATIMA A,UL H I,HAIDER A,et al.Trends to store digital data in DNA:an overview[J].Molecular Biology Reports,2018,45(5):1479-1490. [55]JAIN S,FARNOUD F,SCHWARTZ M,et al.Duplication-correcting codes for data storage in the DNA of living organisms[J].IEEE Transactions on Information Theory,2016,PP(99):1. [56]张淑芳,宋香明.一种高存储密度的 DNA 信息存储编码方案:201811445344.8[P].2018. [57]SILVA P Y D,GANEGODA G U.New Trends of Digital Data Storage in DNA[J].Biomed Research International,2016,2016(5536):1-14. [58]CARR P A,CHURCH G M.Genome engineering[J].Nature Biotechnology,2009,27(12):1151-1162. [59]SHENDURE J,LIEBERMAN A E.The expanding scope of DNA sequencing[J].Nature Biotechnology,2012,30(11):1084 [60]PENNISI E.Genome sequencing.Search for pore-fection[J]. Science,2012,336(6081):534-537. [61]KOSURI S,CHURCH G M.Large-scale de novo DNA synthesis:technologies and applications[J].Nature Methods,2014,11(5):499-507. |
[1] | 王坤姝, 张泽辉, 高铁杠. 基于Hachimoji DNA和QR分解的遥感图像可逆隐藏算法 Reversible Hidden Algorithm for Remote Sensing Images Based on Hachimoji DNA and QR Decomposition 计算机科学, 2022, 49(8): 127-135. https://doi.org/10.11896/jsjkx.210700216 |
[2] | 孙福权, 梁莹. 基于XGBoost算法的水稻基因组6mA位点识别研究 Identification of 6mA Sites in Rice Genome Based on XGBoost Algorithm 计算机科学, 2022, 49(6A): 309-313. https://doi.org/10.11896/jsjkx.210700262 |
[3] | 陈伟, 李杭, 李维华. 核小体定位预测的集成学习方法 Ensemble Learning Method for Nucleosome Localization Prediction 计算机科学, 2022, 49(2): 285-291. https://doi.org/10.11896/jsjkx.201100195 |
[4] | 吴立波, 黄玉芳. 基于DNA链置换的逻辑推理问题研究 Logical Reasoning Based on DNA Strand Displacement 计算机科学, 2022, 49(1): 259-263. https://doi.org/10.11896/jsjkx.210200131 |
[5] | 杜流云, 郑智捷, 郑华仙. 厚壁菌门下两类细菌的DNA全序列可视化研究 Visualization of DNA Sequences of Two Kinds of Bacteria Under Firmicutes 计算机科学, 2020, 47(11A): 192-195. https://doi.org/10.11896/jsjkx.191200070 |
[6] | 汪洋, 李鹏, 季一木, 樊卫北, 张玉杰, 王汝传, 陈国良. 高性能计算与天文大数据研究综述 High Performance Computing and Astronomical Data:A Survey 计算机科学, 2020, 47(1): 1-6. https://doi.org/10.11896/jsjkx.190900042 |
[7] | 韩英杰, 周清雷, 朱维军. 基于DNA计算的计算树逻辑模型检测方法研究进展 Survey on DNA-computing Based Methods of Computation Tree Logic Model Checking 计算机科学, 2019, 46(11): 25-31. https://doi.org/10.11896/jsjkx.181102091 |
[8] | 宫法明,李翛然. 基于Neo4j的海量石油领域本体数据存储研究 Research on Ontology Data Storage of Massive Oil Field Based on Neo4j 计算机科学, 2018, 45(6A): 549-554. |
[9] | 贾鑫,张少平. 基于贪婪策略的NAND FLASH存储器的磨损均衡算法研究 Research on Wear Leveling Algorithm of NAND FLASH Memory Based on Greedy Strategy 计算机科学, 2017, 44(Z11): 312-316. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.066 |
[10] | 郭梅,袁栋,杨耘. 云计算环境下低成本存储科学数据的演化CTT-SP算法 Evolutionary CTT-SP Algorithm for Cost-effectively Storing Scientific Datasets in Cloud 计算机科学, 2017, 44(12): 163-168. https://doi.org/10.11896/j.issn.1002-137X.2017.12.031 |
[11] | 布海力切木·阿吾冬,李国东. 基于DNA编码与正弦混沌映射的气象图加密技术研究 Study on DNA Encoding & Sine Chaos-based Meteorological Image Encryption Technology 计算机科学, 2016, 43(Z11): 403-406. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.092 |
[12] | 黄冬梅,赵丹枫,魏立斐,杜艳玲,王振华. 大数据背景下海洋数据管理的挑战与对策 Managing Marine Data as Big Data:Uprising Challenges and Tentative Solutions 计算机科学, 2016, 43(6): 17-23. https://doi.org/10.11896/j.issn.1002-137X.2016.06.003 |
[13] | 冉娟,李晓宇. 基于秘密共享协议的移动数据存储研究 Mobile Data Storage Solution Based on Secret Sharing Protocol 计算机科学, 2016, 43(4): 145-149. https://doi.org/10.11896/j.issn.1002-137X.2016.04.029 |
[14] | 邓晓军,欧阳旻,李玉龙. 基于虚拟环的无线传感网节能数据存储与查询机制 Energy-efficient Wireless Sensor Network Data Storage and Query Mechanism Based on Virtual Ring 计算机科学, 2015, 42(8): 132-135. |
[15] | 闵 林,樊卫北,郭拯危,凡高娟. 基于RCFile的无线传感器网络数据存储策略研究 Wireless Sensor Networks Data Storage Strategy Based on RCFile 计算机科学, 2015, 42(4): 76-80. https://doi.org/10.11896/j.issn.1002-137X.2015.04.014 |
|