New Levenshtein-Marker Code for DNA-based Data Storage Capable of
Correcting Multiple Edit Errors
- Zihui Yan ,
- Cong Liang
Abstract
With the development of DNA synthesis and sequencing technologies, DNA
becomes a promising medium forlong-term data storage. Three types of
errors may occur in the DNA strand, insertions, deletions and
substitutions,which we collectively call edit errors. It is still
challenging to design a code that can correct multiple edit errors
onnon-binary alphabets. In this paper, we propose a new coding schema
for correcting multiple edit errors on DNAstrands by splitting the whole
strand into consecutive blocks with appropriate length and correcting a
single editerror in each block. Our method, called theDNA-LMcode, could
be considered a generalization of the Levenshteincode combined with the
marker code. We provide a linear encoding and decoding algorithm for
ourDNA-LMcode.Compared to other encoding methods for DNA strands of
several hundred base-pairs, ourDNA-LMcode achievedsimilar code rates and
a much lower average nucleotide error rate in decoding.