New Levenshtein-Marker Code for DNA-based Data Storage Capable of Correcting Multiple Edit Errors
With the development of DNA synthesis and sequencing technologies, DNA becomes a promising medium forlong-term data storage. Three types of errors may occur in the DNA strand, insertions, deletions and substitutions,which we collectively call edit errors. It is still challenging to design a code that can correct multiple edit errors onnon-binary alphabets. In this paper, we propose a new coding schema for correcting multiple edit errors on DNAstrands by splitting the whole strand into consecutive blocks with appropriate length and correcting a single editerror in each block. Our method, called theDNA-LMcode, could be considered a generalization of the Levenshteincode combined with the marker code. We provide a linear encoding and decoding algorithm for ourDNA-LMcode.Compared to other encoding methods for DNA strands of several hundred base-pairs, ourDNA-LMcode achievedsimilar code rates and a much lower average nucleotide error rate in decoding.