TechRxiv
New Levenshtein-Marker Code for DNA-based Data.pdf (1.06 MB)
Download file

New Levenshtein-Marker Code for DNA-based Data Storage Capable of Correcting Multiple Edit Errors

Download (1.06 MB)
preprint
posted on 2021-09-08, 06:06 authored by Zihui YanZihui Yan, Cong LiangCong Liang
With the development of DNA synthesis and sequencing technologies, DNA becomes a promising medium forlong-term data storage. Three types of errors may occur in the DNA strand, insertions, deletions and substitutions,which we collectively call edit errors. It is still challenging to design a code that can correct multiple edit errors onnon-binary alphabets. In this paper, we propose a new coding schema for correcting multiple edit errors on DNAstrands by splitting the whole strand into consecutive blocks with appropriate length and correcting a single editerror in each block. Our method, called theDNA-LMcode, could be considered a generalization of the Levenshteincode combined with the marker code. We provide a linear encoding and decoding algorithm for ourDNA-LMcode.Compared to other encoding methods for DNA strands of several hundred base-pairs, ourDNA-LMcode achievedsimilar code rates and a much lower average nucleotide error rate in decoding.

History

Email Address of Submitting Author

yanzh@tju.edu.cn

Submitting Author's Institution

Tianjin university

Submitting Author's Country

  • China