In-Memory Hamming Error-Correcting Code in Memristor Crossbar

— This paper proposes a in-memory Hamming error-correcting code (ECC) in memristor crossbar array (CBA). Based on unique I-V characteristic of complementary resistive switching (CRS) memristor, this work discovers that a combination of three memristors behaves as a stateful exclusive-OR (XOR) logic device. In addition, a two-step (build-up and fire) current-mode CBA driving scheme is proposed to realize a linear increment of the build-up voltage that is proportional to the number of low-resistance state (LRS) memristors in the array. Combining the proposed XOR logic device and the driving scheme, we realize a complete stateful XOR logic, which enables a fully functional in-memory Hamming ECC, including parity bit generation and storage followed by syndrome vector calculation/readout. The proposed technique is verified by simulation program with integrated circuit emphasis (SPICE) simulations, with a Verilog-A CRS memristor model and a commercial 45-nm CMOS process design kit (PDK). The verification results prove that the proposed in-memory ECC perfectly detects error regardless of data patterns and error locations with enough margin.


I. INTRODUCTION
emristor crossbar is expected to make sign ificant contributions to future computing system in various ways, such as high-density and low-latency non-volatile memories [1]- [6], neural networks [7]- [14], and process-in-memory (PIM) [15]-- [20]. However, device reliability issue is one of the main limiters for their realization into a practical computing system [12]. For example, there is an inherent trade-off between performance and reliability of a memristor device that sacrificing reliability is inevitable to operate the memory at speed. Such issue aggravates further when they come to logic operations in neural network and PIM applications where the access rate increases by orders of magnitude.
Error-correctin g codes (ECC) is an essential technique in memory technology that signif icantly enhances data reliability. Hamming ECC, first proposed in 1950 [21], is one of the most popular ECC and is still widely used to address the reliability issue of the various legacy memories [22]- [26]. Parity bits are stored in memory array along with data bits during the data programming, and are used to tell whether there have been any bit flips, or error, occurred durin g a data read out. In general, a n ECC engine located outside of the memory array calculates parity at programming and decodes the ECC at read out to detect and correct the error so that a clean data is sent to user. Fig. 1 gives a brief explanation on how a Hamming ECC for 4-bit data (d1-d4) works, so-called Hamming (7,4) ECC. Here, the entire data consists of 7 bits including 3-b it parity and 4-bit data. Hamming ECC necessitates even/odd parity generation during p rogramming, where the count of '1' in the data+parity codes always becomes either even or odd. Therefore, it starts by creating three parity bits (p1-p3) from three different combination of datasets based on exclusive-OR (XOR) lo gic operation, which is grouped into different circles of Fig. 1 (a). In this case, the XOR lo gic outputs '1' if the number of '1' is odd in any of the three datasets. Therefore, the total number of '1' in a circle of Fig. 1(a), which is comprised of 3-bit data and the corresponding 1-b it parity (e.g., d1, d2, d4, and p1 in the uppermost circle ) should always be even. In other words, if we take an XOR of the 4-bit data in a circle during a read out, it should be zero if there has been no bit corruption. Based on the observation, the Hamming ECC takes those XORs as a syndrome vector to exactly point where the error has occurred among the 4-data bits, as shown in Fig. 1(b). Fig. 1(c) and (d) show an example. With d1through d4 being 1001, p1through p3 becomes 001. With no error, the syndrome vector calculation must give (0, 0, 0 ). However, when an error occurs at d1, the XORs of p1 and p2 circles give 1 instead of 0, and lead to the syndrome vector of (1, 1, 0), which informs that the ECC should correct d1 before the data is sent to user. Similarly, the syndrome vector of (1, 0, 1), (0, 1, 1), and (1, 1, 1 ) point to an error occurred at d2, d3, and d4, respectively. This work proposes a novel in-memory Hamming (7, 4) ECC in a memristor crossbar. Simulation program with integrated circuit emphasis (SPICE) verif ications with a Verilo g-A memristor model and commercial CMOS transistor are provided to fully validate the complete steps of Hamming encoding and decoding, including the parity bits and the syndrome vector generation. To the best of the authors' knowledge, this work is the first ever reported in-memory ECC scheme that implements an entire step within a memristor crossbar. [17] and [27] have proposed an in-memory parity generation and an in-memory parity check schemes, respectively, wh ich in fact have inherent limitations to be extended to cover the entire steps.
The remainder of this paper is organized as follows. Sect ion II introduces a complementary resistive switching (CRS) memristor which is adopted in this work [28]. A novel stateful XOR logic implementation based on the CRS memristor is also proposed in Section II. Section III details the circuit techniques for the robust stateful XOR logic, which are followed by SPICE validation results. Section IV presents the verification results from SPICE simulation with the device model and a complete array. Finally, conclusions are provided in Section V.

II. COMPLEMENTARY RESISTIVE SWITCHING (CRS)
MEMRISTOR AND XOR LOGIC WITH CRS [17] proposes and verifies that a current-mode driving scheme enables a linear increment of word-line voltage with respect to the number of low-resistance state (LRS) memristors in the dataset array. Combined with a unique I-V characteristic of a CRS memristor and gray code, such linear word -line voltage is used to program parity bits to a CRS memristor, which is a base work of this work. Fig. 2 shows an example of I-V curve of the CRS memristor obtained from SPICE simulation with a Verilog-A model, which is used throughout this work. The device parameters are tunable in the model, but most of the parameters are chosen to mim ic the physical device characteristic in [28] such as Roff/Ron. The unique characteristic of the CRS device originates from switching between the opposite defect configurations which can be read as a high resistance. Assuming that the CRS memristor is in itialized to a high-resistance state (HRS), the resistance state goes to the LRS as a conducting path is formed when the applied voltage across the top and bottom electrodes increases to positive direction. However, if the voltage increases further, the device goes to another HRS state (HRS') due to defect migration to the opposite side [17]. On the other hand, once the voltage stops increasing, the device stores the resistance state where the increment has stopped until a higher voltage is applied, or it is initialized to the HRS by reset operation. The reset happens when a strong negative voltage is applied. As discussed in Section I, the binary XOR lo gic operation is the key for realizing the parity generation and the Hamming ECC. Although it is proven that the gray code and the multi-bit operation by utilizin g the intermediate states between HRS/LRS and LRS/HRS' enables parity bit generation without using XOR logic in [17], however a further in-memory processing is not possible due to the complexity of decoding the gray -coded resistance. Therefore, to realize a complete in-memory ECC, this work does not rely on the multi-bit operation but utilizes the stateful in-memory XOR logic. Fig. 3(a) shows a conceptual explanation of the XOR logic with the CRS memristors. Given that a linear incremental voltage with respect to the number of '1's (=LR S memristors) in the dataset is available from the current-mode scheme, an I-V curve of 'M' shape or an R-V curve of 'W' shape can serve as an XOR logic. In addition, of course the curve should keep the resistance state to store the parities and the syndromes, in other words it should be a stateful logic. In Fig. 3(a), the blue line shows a simplif ied I-V curve of a single CRS device. If two CRS devices are stacked in series, its I-V curve must be a stretched version of a single CRS's in V-axis, as the red line in Fig. 3(a). If the stacked devices are connected in parallel with a single device, the total current is the sum of each current, where we obtain the M-shape I-V curve. For the rest of the paper, this cell is referred to an XOR memristor. The concept is fully verified with the SPICE simulation with the CRS device model, which is shown in Fig.  3(b). Neglecting all intermediate states for simplicity, which are indeed included in the SPICE model, the XOR memristor has five resistance states: three HRS states (HRS1, HRS2, HRS3) and two LRS states (LRS1, LRS2). Fig. 3(c) shows a state table of each resistance states for better understanding. Once initialized by a strong negative reset voltage, all three memristors are in the HRS state. If a positive voltage of 1 V is applied, R1 is flipped to LRS, but R2 and R3 stay at HRS because the voltage stresses are only 0.5 V due to the stacking. If the voltage increases to ~1.5 V, then it flips R1 to HRS' but it is still not enough to flip R2 and R3, leading to the overall resistance to HRS2. Once the voltage reaches 2 V, then R2 and R3 become LRS so the overall resistance goes to LRS2. If the voltage keeps increasing, R2 and R3 also reach HRS'.

III. VALIDATION OF IN-MEMORY XOR WITH CURRENT-MODE DRIVING SCHEME
In the previous Section, we discussed how a 'M' -shaped I-V curve is obtained and it can be used as an XOR logic. To fully implement XOR, the I-V curve should be translated to current versus digital input, which means the XOR memristor should be preceded by a digital-to-analog converter (DAC ) wh ich translate a digital data (i.e. how many 'l's in the array) to an analog voltage. Fig. 4(a) and (b) shows a circuit diagram of CRS array which explains how the proposed current-mode drivin g DAC works. Four data memristors (RD1-RD4) and an XOR memristor share a word-line. When the array is configured to the XOR operation, the data memristors are driven by PM OS transistors whose gates are biased to an analog voltage. When a data memristor is HRS, the transistor operates in triode region due to the lo w drain-source voltage (VDS) by the large IR drop by the HRS memristor. In other wo rds, the current of the path is constrained by the HRS memristor. On the other hand, when a data memristor is LRS, the biased transistor behaves as a current source because the IR drop across the memristor is low. As a result, the current flowing through the memristor is constrained by the saturation current of the transistor, wh ich means the word-line voltage has little impact on the current. It allows a linear increase of the total current from the memristor array with respect to the number of LRS cells in the array. Due to the nature of parallel resistance, a typical voltage-mode scheme like [29] cannot provide such linearity. The linear current is translated to word-line voltage by a parallel build-up resistor RBU. Note that during the build-up phase, the XOR path is disconnected so the current solely flows to the build-up resistor as shown in Fig. 4(a). The bleeder current (IBLEED,P) is used to shift the current curve (y-intercept) by introducing a DC offset current to the build-up resistor. Once the word -line is fully charged, the b it-line of the XOR memristor is connected to VSS as shown in Fig. 4(b), and simultaneously the build-up voltage programs the XOR. The program voltage versus the number of LRS cells in the dataset array is shown in Fig. 4(c) and (d), where the 3-input case (parity bit program) and the 4-input case (syndrome vector program) are given, respectively. Differential non-linearity (DNL) p lots normalized to least-significant bit (LSB) are also depicted in Fig. 4(c) and (d), which prove the linearity (max DNL < 0.3 LSB) of the proposed current-mode DAC. The two-step (build-up and fire) scheme is required to reso lve the voltage saturation issue of the one-step scheme in [17], which is explained in Fig. 4(e). If the current d irectly drives the XOR without the build-up step and the resistor, the current-to-voltage transformation cannot be linear. For example, the voltage saturates at ~V2 even though RD1-RD4 are all LRS, because once the resistance enters the LRS region the IR drop cannot be higher.

A. Experimental testbench design
The proposed ECC is verified with SPICE simulation with a commercial 45-nm CMOS process design kit (PDK) and the Verilo g-A memristor model. Note that a specific device shown in Fig. 2 is used in this experiment, however the scheme is not sensitive to the device parameters as long as the on/off ratio is comparable or higher, because the transistor design engineering provides enough flexibility. Fig. 5 shows the circuit d iagram of the crossbar array and the experiment steps. The array consists of four data memristors and four XOR memristors, and they all share the same word-line. Three XOR memristors are to generate and to store the 3-bit parity for the Hamming ECC, but the last XOR memristor is to calculate the syndrome vector. In the first phase of the operation, 4-bit data is sequentially written to RD1-RD4. The word-line is driven by VSS (= 0V) during this phase, and the bit-line of the selected cell is driven by VWRITE, whose value depends on the binary datum to be written to the cell (-1.0 V to program '1' or LRS, +2.5 V to program '0' or HRS).
After programming the data memristors with the desired pattern, the parity generation phase starts. During this phase, the word-line switch to VSS stays disconnected whereas the current-mode scheme drives the bit-line of the selected dataset while the unselected one stays unconnected. This step is repeated by three times to program 3 parity bits to RP1 , RP2 , and RP3, while the selection is based on the parity generation matrix from Fig. 1. For example, for p1 generation, the vector is (1, 1, 0, 1) which means RD1, RD2, and RD4 are driven by the transistors but the bottom electrode of RD3 is floated, so it is the 3-bit XOR logic discussed in Section III. As described in the Section III, during the parity generation step, the bleeder current is also activated to bring the transfer curve to the optimum for the given XOR memristor.
In the next step, the syndrome vector is sequentially programmed to the last XOR memristor (RSYN), and RSYN readout happens for checking error. In fact, the programming-syndrome phase is divided into two sub-phases. Because the syndrome vector does not have to be stored since it will not be used in further in-memory operations unlike the parity bits, it is wasteful to have dedicated memristors for each syndrome elements. Therefore, in this experiment, a sin gle shared XOR memristor is used so each syndrome elements are sequentially processed. As a resu lt, the first sub-phase should be resetting the RSYN to erase the previously stored bit. After the reset, the 4 -bit stateful XOR lo gic happens, which is shown in Fig. 6(c). Durin g this sub-phase, the dataset is also selected from the Hamming ECC criteria shown in Fig. 1. For example, for s1 calculation, RD1, RD2, RD4, and RP1 are selected while the others are floated. Because a different offset should be applied to this 4-bit XOR over the 3-bit XOR , an additional bleeder path is activated to provide an adjusted shift to the curve.
After programming RSYN from the 4-bit stateful XOR operation, the read voltage (VREAD=0.3V) is applied to the bit-line of RSYN and the word-line is connected back to VSS, as shown in Fig. 6(d). The read current is compared to the reference current (ITH) to determine the syndrome is 0 or 1. The syndrome step is repeated by 3 times to achieve the complete syndrome vector (s1, s2, s3).
To fully validate the ECC functionality, an additional step is included in the testbench. As long as the stateful XOR lo gic functions correctly, there is no error in the data and the parity bits (d1-d4, p1-p3). If so, the syndrome vector is always (0, 0, 0). So, an erro r injection step is added after the syndrome phase, and then another syndrome phase is executed to see if the syndrome vector points to the correct error location. Fig. 6 shows the test resu lt from a case with the 4-bit data of  > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 6 0011 and the intentional erro r injection to the d1. The first and second plots show the bit-line-voltage waveforms of RD1-RD4, RP1-RP3 and RSYN, respectively. The third plot shows the shared word-line-voltage waveform. The last waveform shows the current flow of the RSYN, from which the syndrome vector is obtained during the syndrome readout phase. In the first and the third plots, durin g the first phase (0~4ns), we can see RD1-RD4 are written sequentially while the word-line is t ied to VSS (0 V) and the bit-lines are driven by -1.0 V or 2.5 V depending on the data polarity. In the second phase (4~7ns), the word-line voltage is established based on RD1-RD4 and fired to program correct parity bits to RP1-RP3. In this particular case, the build-up voltages are 1.63V, 1.08V, and 1.08V, respectively, which are mapped to HRS2, LRS1, and LRS1 of Fig. 3(b), respectively. From 7ns to 16ns, the syndrome reset, progra m, and readout are repeated by three times, and each operation are relevant to the syndrome vector elements s1, s2, and s3. During the programming steps, HRS2-corresponding voltages (1.55~1.67V) are applied to RSYN. As a resu lt, because RSYN is HRS2 for all three sub-phases, the read current is always low (~80A), leading to the syndrome vector of (0, 0, 0). After that, the d1 error injection phase starts at 16ns, where the BL1 is driven by +2.5V to bring the RD1 back to the HRS state. From 17ns to 26ns, the array repeats the syndrome phase with the injected error. Because the d1 is flipped, the syndrome vector should be (1, 1, 0) by the definition of the Hamming ECC, as shown in Fig. 1. In the test, the syndrome programming voltages are 1.13V (LRS1), 0.94V (LRS1), and 1.55V (HRS2), respectively. As a result, the correct syndrome vector of (1, 1, 0) is ev idently obtained from the measured current of (495 A, 406A, 74A).

B. Experimental simulation result
Whereas Fig. 6 details the multiple waveforms when the error is injected to d1, Fig. 7 focuses on validating the syndrome vector when the error is injected to d2, d3, and d4, respectively, by observ ing only the syndrome current. For all cases, the syndrome vector is always (0, 0, 0) before the error injection wh ich verifies the correct parity generation by the in-memory stateful XOR logic. On the other hand, the vector becomes (1, 0, 1), (0, 1, 1), and (1, 1, 1) after the error is injected to d2, d3, and d4, respectively. It proves the correct Hamming ECC function as well as the stateful XOR logic. Fig. 8 shows the syndrome current over all 4-b it input data cases, with the error injected to d1. No matter which data is written init ially, the correct syndrome vector for d1 error of (1, 1, 0) is obtained after the error injection. Also, the vector is always (0, 0, 0) before the error injection. In summary, the results given in Fig. 6 and Fig. 7 Fig. 7. Syndrome current measurement when the error is injected to d2, d3, and d4, respectively. points to where the error happens, and the result in Fig. 8 verifies that it functions robustly with enough margin regardless of the stored data.

V. CONCLUSION
This paper presents a novel in-memory Hamming ECC technique in memristor crossbar. A stateful XOR lo gic based on CRS memristors is proposed to enable the in-memory parity bit generation and syndrome vector calculation. In addition, a two-step (build-up and fire), current-mode bit-line driv ing scheme allows a linear digital-to-analog conversion within a crossbar, which helps retaining enough voltage margin for robust operation. Complete array-level SPICE simulations prove that the proposed ECC technique functions perfectly with enough margin, across all possible data input and error injection combinations.