High-Density Memristor-CMOS Ternary Logic Family

This paper presents the first experimental demonstration of a ternary memristor-CMOS logic family. We systematically design, simulate and experimentally verify the primitive logic functions: the ternary AND, OR and NOT gates. These are then used to build combinational ternary NAND, NOR, XOR and XNOR gates, as well as data handling ternary MAX and MIN gates. Our simulations are performed using a 50-nm process which are verified with in-house fabricated indium-tin-oxide memristors, optimized for fast switching, high transconductance, and low current leakage. We obtain close to an order of magnitude improvement in data density over conventional CMOS logic, and a reduction of switching speed by a factor of 13 over prior state-of-the-art ternary memristor results. We anticipate extensions of this work can realize practical implementation where high data density is of critical importance.


I. INTRODUCTION
C ONVENTIONAL digital systems compute using binary logic, where only two possible values are available in the Boolean space. While this allows for large noise margins, there are applications where the need for data density is more critical. For example, commercial solid-state drives use quad level cells (QLC) for high storage density [1], [2], but come at the cost of slow write times. Various serial links, such as Gigabit Ethernet [3], employ similar multilevel techniques when the channel bandwidth is insufficient. Analog-to-digital converters also take advantage of ternary logic where redundancy is used to reduce quantization errors [4]- [6].
The design of ternary logic in VLSI-amenable implementations using MOSFETs dates back to the 1980s [7], and BJT design even further back to the 1960s [8]. Multilevel logic enables pre-processing multilevel cells (MLC) prior to binary expansion, and is one of many options for improving high-speed data density and channel bandwidth. But beyond data encoding and high-density storage class memory, multilevel logic has seen limited use. This is because ternary logic circuits require more gates than their CMOS counterparts, which comes with 1) larger area consumption, 2) increased propagation delay, and 3) weaker noise tolerance due to an intermediate logic level. These partially offset the benefits of MLC.
Multi-level non-volatile memories, such as memristors, have generated renewed interest in building dense logic, both digital and multi-state [9], [10]. Two distinct categories of two-level logic have emerged using memristors: state-based logic, which stores the output as the resistance state, and conventional voltage-based logic. For example, material implication logic (IMPLY), memristor-aided logic (MAGIC) and scouting logic use resistance to represent logic states [11]- [14], but are burdened with substantial peripheral overhead [15]. In general, the benefit of reconfigurability in stateful logic is offset by the peripheral control schemes required to implement it. Memristor ratioed logic (MRL) is likened to conventional CMOS logic, where the output voltage level defines the logical state [16]. While stateful logic may be beneficial in non-von Neumann architectures, the reality is that EDA tools have been optimized for CMOS-like technologies. The design abstractions enabled by CMOS VLSI are straightforward to translate to MRL as it relies on voltage-level representation of data, which translates well to functional systems of significant complexity. But in the same way, EDA tools are yet to be optimized for multi-level logic processing, and automation tools for RRAM integration have only just started to emerge after over a decade of heavy investment.
Memristor-CMOS processes integrate nanoscale memristor devices with CMOS in the back end of the line (BEOL), which makes it possible to alleviate some of the above drawbacks. Firstly, area utilization is improved by embedding the memristor fabric upwards, rather than outwards [17]- [22]. Other non-volatile memories, such as Flash, rely on floating gates which consumes silicon area. Now that memristor-CMOS integration is available in commercial processes (e.g., TSMC 40 nm RRAM-process [23],  [24]), device variation is less of a barrier to industry use of memristor-CMOS technologies. Rather, the slow switching speed due to the wide band gap of oxide ions, and limited endurance pose greater challenges. Fast switching speeds typically require subjecting memristors to very large electric fields, causing endurance degradation.
To combat the speed issue, we fabricated a thresholdswitching metal-insulator-metal (MIM) structure memristor device, which shows fast switching speeds (≈30 ns), and implemented it in a series of optimized ternary logic gates designed for this device in a 50 nm process. This device uses indium-tin-oxide (ITO) as the switching layer, with long retention (at least 10 years), a low forming voltage (-1.3 V), and low switching voltages (< ±0.5 V).
Ternary logic improves data density by computing at a higher radix than binary, but this is often done so at the expense of additional components and routing. This is mitigated by integrating memristors in the back end of the line, which enables substantially higher data density over conventional CMOS logic.
To reduce the burden of decreased noise margins in multilevel logic, our memristor shows a very high transconductance during switching (= 28.44 mV/dec), and has a built-in vanadium selector integrated as the top electrode. This enables threshold switching (≈0.4 V) without the addition of transistor selectors and reduces current leakage, which improves noise tolerance. This device has been integrated into an unbalanced ternary logic family summarized in Table I. Prior memristor-CMOS ternary logic designs use idealized memristor or transistor models [25]- [28]. There is a lack of physically feasible simulations and experimental demonstrations of a complete memristor-CMOS ternary logic family in the literature. This paper systematically designs, simulates and experimentally verifies an integrated memristor-CMOS logic family using a standard 50 nm process. Our gates achieve stateof-the-art performance compared to other memristor-CMOS designs in terms of area, power and speed. We achieve data density improvements over conventional CMOS gates by a factor of 3.9-25.5 times, and speed improvements by a factor of 13 over state-of-the-art high-speed memristor ternary logic implementations. All SPICE netlists are documented and available online for reproduction of our results. 1 In section 2, we present the circuit of the primitive memristor-CMOS gates: TAND, TOR, and various forms of the TI gate. Section 3 presents ternary compound logic built using the primitive gates: TNAND, TNOR, TMAX, TMIN, TXOR, TXNOR. Specifically, TNAND and TNOR gates are constructed by cascading TAND and TOR gates with a TI gate. The TXOR and TXNOR gates are designed using Karnaugh map minimization. SPICE simulations are provided throughout, to validate each circuit as it is presented. Section 4 provides detail on device fabrication and characterization, and demonstrate its performance as a ternary encoder and decoder. A study of area, power and switching speed are provided to compare against other memristor-CMOS ternary logic circuits.

II. TERNARY LOGIC GATE DESIGN
Ternary logic can be divided into two types: balanced, which is expressed as (−1, 0, +1), and unbalanced. This latter unabalanced type comes in two forms, positive ternary (0, 1, 2) and negative ternary (−2, −1, 0) [29]. In this work, we focus on unbalanced positive ternary logic, where (0, 1, 2) = (GND, V DD /2, V DD ). The truth table is given in Table II, where the output values of the TAND and TOR gates depend on the minimum and maximum values of the inputs, respectively.

A. Ternary AND and Ternary OR
The TAND and TOR gates are a simple extension of MRL in [16], where two memristors are connected with alternating polarities, as shown in Fig. 1. However, in our ternary implementation, each input has three allowable states.
When the input voltages differ, the output voltage is determined by the voltage divider principle. As an example, consider the case v in1 > v in2 in the TAND gate from Fig. 1(a). Current flows from v in1 to v in2 , and the voltage across each device can be obtained from: This equation assumes there is negligible loading from fanout. As this may not necessarily be the case for memristive logic, we demonstrate a method to buffer stages using a source follower in our experimental results in Section IV. Assuming the voltage drops across each memristor are greater than the threshold for switching, a sufficient electric field is built up across each device causing them to switch. The bipolar switching characteristics of thin-film metal oxide memristors cause M 1 to switch off (R M1 → R OFF ), and M 2 to switch on (R M2 → R ON ). Assuming R OFF R ON , the output voltage v out can be obtained as: Note the output is approximate, as there is a small potential drop across M 2 . This drop will increase with the fan-in of the gate, and imposes a limit on the allowable number of inputs for a TAND gate. It can be mitigated by using sufficiently high on/off resistance ratios (ideally a factor of 10 for two inputs). The same procedure above can be carried out for the TOR gate in Fig. 1(b) to give the following result: The output voltages are provided as a function of the input and summarized in Table III, which is consistent with Table II. The SPICE simulation results of all nine possible inputs to the TAND and TOR circuits are provided in Fig. 2, using Knowm's memristor model from [30]. The parameters used are provided in Table IV. The transistor models are from a 50 nm process (Level 54 BSIM4), and a supply of V DD = 1 V. Detailed parameters can be found in the online repository containing the SPICE netlist [31].

B. Ternary Not
In unbalanced ternary logic systems, inverters can be classified into three categories: simple ternary inverters (STI), positive ternary inverters (PTI), and negative ternary inverters (NTI). As a supplement to Table II, the truth tables for the three inverters are given in Fig. 3 Table. 1) STI Logic Gate: The STI gate is shown in Fig. 3(a). The NMOS transistor N 1 must be 'stronger' than N 2 (i.e., width W 1 >W 2 , and/or bulk-to-source voltage V BS1 > V BS2 ), such that the threshold voltage V TH1 < V DD /2 and V TH2 > V DD /2. We note that the relation between threshold and sizing and back-body biasing is highly process-dependent, and may not generalize to all cases. The circuit operation in each of three modes is described below: • Input Logic '0': when the input is grounded (logic '0'), both transistors are off, and the output is pulled up to V DD (logic '2') through M 1 . When connected to a load, M 1 will switch off as the negative electrode of the memristor is tied to V DD , ensuring a highly resistive current pathway for low static power dissipation. • Input Logic '1': when the input is set to V DD /2, N 1 is turned on and N 2 remains off. Current will flow through the two memristors from V DD to ground. As a result, the resistance of both M 1 and M 2 will increase to R OFF . The resistive divider drives the output to V DD /2. Orienting the memristors with their negative electrodes positively biased ensures they will both switch off, thus minimizing current draw. • Input Logic '2': when the input is set to V DD , both transistors are on. Consequently, the output terminal is shorted to ground through N 1 and N 2 . 2) PTI and NTI Logic Gates: PTI and NTI gates have the same circuit structure, consisting of one memristor and one transistor. I.e., only the upper half of the STI gate is required (Fig. 4). However, the threshold voltage of N 1 in the NTI must be below V DD /2, and above for the PTI which can be achieved by appropriate sizing (increasing W 1 for the NTI, decreasing W 1 for the PTI), or altering the substrate potential (increasing V BS for the NTI, decreasing V BS for the PTI). To illustrate, consider a DC sweep starting with V in = 0. Initially, N 1 is off, and so the output is pulled up through M 1 . An input logic 0 corresponds to an output logic 2. If the input is increased above the NTI threshold voltage (e.g., V DD /4 for optimal noise margins), then N 1 will pull the output down. Here, an input logic of 1 will correspond to an output of 0. However, as the PTI threshold (3V DD /4 for optimal noise margins) is higher than the NTI, the output will remain pulled up. Therefore, an input of logic 0 gives an output of logic 2.
Driving the input all the way up to V DD will have no further effect on the NTI as N 1 is already on. The PTI will behave similarly, as the N 1 will also switch on. Therefore, an input of logic 2 will give a low output for both the PTI and NTI.
Simulation results for the three classes of inverters are shown in Fig. 5 for all possible input cases, using the same memristor parameters previously given in Table IV. On inspection, the results are consistent with the truth table in Fig. 3(b).

III. COMBINATIONAL TERNARY LOGIC
The previous section presented and verified the TAND, TOR and TI gates via SPICE simulations. The following sections  use these gates compositely to build the TNAND, TNOR, TMAX, TMIN, TXOR, and TXNOR gates.

A. Ternary NAND and Ternary NOR
TNAND and TNOR gates can be intuitively constructed by connecting TAND and TOR gates to a STI gate, and is shown in Fig. 6. SPICE simulations are provided in Fig. 7. The finite output impedance of the TAND and TOR gates present no practical issues as the input to the next stage is effectively buffered by the large input impedance of the NMOS gates.

B. Ternary Maximum and Minimum
As described in the previous section, TAND and TOR output the minimum and maximum value of the two inputs, respectively. This can be implemented by extending the inputs to the TAND and TOR gates shown in Fig. 8(a)-(b). SPICE simulations are shown in Fig. 8(c) using a 3-input gate, which verifies that the output of the TMIN gate is always the smallest of the input values, and the output of the TMAX gate is the largest of the inputs.

C. Ternary XOR and XNOR
In order to design the TXOR and TXNOR gates, we construct their Karnaugh maps (Fig. 9). Through minimization, Fig. 7. SPICE simulation results of the TNAND and TNOR gates. Transients in both plots occur due to discontinuities in the memristor model, which also led to challenges for simulation convergence. The fix was to expand to a millisecond timescale. Note that this is a limitation of the numerical methods used in SPICE; our experimental results successfully realized nanosecond timescales, which will be shown in the following sections. we obtain the following logic functions: A minimized gate level circuit can be constructed from (5) and (6), formed of a cascade of TAND, TOR, TNAND and   [32]. This is schematically the same as a standard binary XOR and XNOR gate. It is preferable here to use TOR and TAND gates as they occupy less chip area. (b) SPICE Simulation results. Logic stages with high output impedance must be buffered to be capable of driving subsequent stages, as was the case here. This is demonstrated in the experimental results.

IV. EXPERIMENTAL RESULTS
In this section, we will briefly describe the process used to fabricate the ITO memristors, along with the parameters relevant to gate-level logic. This is important because this device enables us to overcome the difficulties in realizing a functional memristor-CMOS ternary logic family: namely, the fast switching speed, low voltage programming, and the built-in selector which suppresses subthreshold currents. This device is then implemented as a ternary encoder and decoder to verify the correct operation of the gates presented in previous sections.

A. Device Fabrication
The memristor is a V/ITO(O 2 )/TiN MIM structure ( Fig. 11(b)) fabricated in-house. We deposited a 200-nm-thick TiN layer as the bottom electrode using chemical vapor deposition on a Ti/SiO 2 /Si substrate. 10-nm of thin-film ITO is deposited via RF sputtering on an ITO target, Ar gas flow rate of 30 sccm and O flow rate of 20 sccm at 8 mtorr, which oxidizes the ITO film to create ITO(O 2 ). The switching layer is patterned using a mask aligner process with a cell size of 600nm × 600nm. The minimum transistor width used here is 500nm, which allows for reasonably good alignment between memristor and transistor. To form the top electrode, a 100-nm-thick V layer is deposited via DC sputtering. Self-oxidation of the V layer creates a built-in selector. Therefore, this device integrates a one-selector-one-memristor (1S1R) cell into the same process, and is structurally simple to fabricate. Finally, Pt is deposited to prevent further oxidation of the top electrode. This completes the V/ITO(O 2 )/TiN structure.

B. Device Characterization
The device was characterized using a B1525A semiconductor parameter analyzer for pulse measurements, B1500 and B1505A for DC measurements, and B1530A for high-speed IV measurements. Fig. 12(a) shows the results of an I-V sweep using a peak amplitude of ±0.5 V. The selector dominated operating region (colored red) ensures leakage suppression during subthreshold operation. Set and reset occur at approximately ±0.4 V. Prior to setting the device, there is a narrow margin where the built-in selector switches on, enabling a read operation without setting the device. This is especially useful for storage class non-volatile memory, stateful logic, and neuromorphic computing. The selector can be modeled as a large series resistance to the memristor while it is off. When in a pull-up network, this series resistance suppresses quiescent current from supply to ground. For example, if the output is supposed to be a logic '0' this suppression will reduce any voltage increase from supply to output, and therefore improve noise immunity. Note that this is a highly digital device which sets and resets rapidly with a very high transconductance. This can be seen in Figs. 12(b-d), where the subthreshold swing is measured to be 28.44 mV/dec, ensuring usable noise margins at low supply voltages. This fast switching action is expected to occur due to the use of a low-k switching layer, which highly concentrates the electric field. This also allows for a small forming voltage (≈1.3 V). Cycle-to-cycle variation is characterized across 100 cycles measured with a voltage sweep of ±0.5V peak amplitude, shown in Fig. 13. A summary of memristor parameters is provided in Table V.

C. Ternary Encoder and Decoder
To experimentally verify the logic gates and device presented, we prototyped a ternary decoder and encoder at the board-level, using near-identical supply and threshold parameters to that of the NMOS transistors used in the simulations. We constructed a 1-to-3 decoder and a 3-to-1 encoder. The decoder takes a ternary input and generates three unary functions of either logic 0 or 2 at the output. Our design of the decoder consists of a PTI gate, two NTI gates, and a TNOR gate (Fig. 14). Note the finite output impedance of the NTI and PTI gates, which makes them poor drivers when cascaded to stages with low input impedance. Therefore, when connecting high output impedance stages (such as the NTI and PTI gates) to low input impedance stages (such as the TNOR gate), a source follower should be used as a buffer. Although this results in a small potential drop from the gate to source, it is boosted back up in subsequent stages. In our experiments, the nonlinear gain of a source follower was tolerable as there are only three distinguishable voltage levels of concern. Where necessary, this can be alleviated by adding a constant current source (e.g., an NMOS transistor in saturation) to the source, or removing the body effect. An alternative method is to use a minimum sized bleeder PMOS to restore charge to the output node. Another clear drawback is the necessity of an additional NMOS wherever a poor driver is connected to a low impedance gate. But as we have avoided the use of PMOS transistors by pulling up with memristors, and given that the memristor would be integrated in the BEOL, this consumes less chip area than a standard CMOS inverter, even with the additional logical state. An area analysis will be given in the next section. Experimental results of the ternary-decoder are provided in Fig. 15, which operates as expected on inspection.
A similar technique is adopted in the construction of a 3-to-1 encoder. This time, we introduce a PSTI inverter in Fig. 16(a) which behaves similarly to the STI inverter, but with the caveat that it does not fully pull down to 0. Rather, the output is limited by the resistive divider effect, and can only go as low as half the supply voltage. This is needed for the ternary encoder in Fig. 16(b), where an intermediate output must be generated from high and low input signals. Two alternative topologies are presented. The highlighted nodes must be buffered, which is achieved with a source follower in the same way as shown with the ternary decoder. Experimental results are shown in Fig. 17. In all cases, the initial conditions of the memristors were set  for the worst-case scenario (i.e., if the output is to be high, we initialized our memristor to favor the pull-down network and vice versa. If the output is to be in the middle logic level, we initialized our memristors to be off to maximize RC delay).

V. COMPARISON AND DISCUSSION
For a comparison against their digital CMOS counterparts, we designed the layout of the ternary logic gates using Cadence tools. An area and data density comparison is provided in Table VI. The silicon area and memristor fabric consumed are treated separately as they are vertically integrated. The memristor used is described in the previous section, with a cell size of 600nm × 600nm. The (proposed size)/(CMOS size) metric shows the silicon area consumption of our ternary gates in the worst-case (with an additional source follower stage included) as a proportion of the equivalent digital counterpart. This has also been depicted in the bar plot in Fig. 18. Data density improvement is calculated by multiplying the silicon area consumption improvement factor by the additional improvement gained in using ternary logic over digital logic. This improvement is quantified by a factor of log(3)/log(2)=1.58. In the case without the source-follower, the TAND and TOR gate area improvement would tend to infinity. To give a result that makes more sense, we have included the 65nm × 65nm contacts required to connect the two memristors together in the top metal layer and treated that as silicon occupation. It is clearly evident that vertical integration of memristors gives an unparalleled advantage in data density over conventional CMOS logic.
The area of the NTI and PTI gates can be reduced by modifying body voltage, but we have again assumed worst-case in the absence of additional supply rails, and used transistor sizing to alter threshold. Another interesting result arises when comparing the area of STI and the NTI. Despite the NTI having a smaller transistor count, the STI is smaller due to sizing requirements. Most prior literature compares size on the basis of transistor count [26]- [28], [33], [34]. As is clear by this comparison, this practice does not give an accurate reflection of chip area utilized. When ternary logic is used, sizing is often determinative of operation and tends to outweigh the effect of transistor count.
Power consumption of the primitive gates are presented in Table VII. Static power dissipation is estimated at the worst-case using the listed inputs by taking the sum of the voltage-current product through each element. This is done using extraction from layout which passes DRC/LVS checks in Cadence. Average power dissipation is calculated by averaging the static power dissipation for every input combination. Ternary logic is typically designed by one of two methods: either using multiple supply rails, or by dividing down a single supply. In memristive logic, the latter strategy is utilized to avoid the need for charge pumps and additional power regulation. Therefore, memristive logic suffers from substantial power dissipation in comparison to its CMOS counterpart, which is on the order of tens of nanowatts.
A more fair comparison would be against other memristor-CMOS implementations of ternary logic, but there is a lack of any experimental results in the literature. Although Fig. 14. Gate level schematic of the proposed ternary decoder, with transistor level depiction using a pair of source followers in between stages to improve the NTI as a driver. Here, g m is the transconductance of the source follower, R OUT refers to the output resistance of the previous stage, and R IN is the input impedance to the following stage. R OFF and R ON are chosen here for the worst-case. Note that no additional source resistance is required for either source follower as the memristors from the TOR gate act as R S . The above approximations of input and output impedance neglect channel-length modulation, but our experimental results demonstrate this is a reasonable assumption.   device-level demonstrations of ternary logic are also sparse, some innovative techniques relying on stateful-driven methods have been proposed in Refs. [35], [36]. Though power metrics of an integrated system have not been performed in either, the forming voltage required in Ref. [35] is in excess of 7 V, whereas our V/ITO(O 2 )/TiN device only requires 1.3 V. Their ZnO memristor requires milliamp compliance currents and both instances rely on state-driven logic, which has been shown to have overhead CMOS circuitry over 50 times that of a conventional CMOS implementation [15]. Neglecting this overhead, we can make an estimation based on current draw alone that our power dissipation is an order of 10 3 times more efficient at the expense of reduced reconfigurability.  The work in [27] simulates a CNTFET TNOR gate which consumes 11.29 mW of power, and memristor-CNTFET TNOR gate which consumes 34.62 mW. For a fair comparison, we assume the memristors dominate power dissipation, and normalize the on resistance value from 50 to 100 , so equivalent power dissipation would be approximately halved. Our TNOR implementation would then achieve close to an order of 10 2 better power efficiency.
A closer comparison point is available in [28], which simulates worst case static power in their TAND and TOR gates to be 8 μW. We performed the analysis in our own design environment with modified parameters (on and off resistances are reduced by an order of magnitude from 1 K to 100 , 100 K to 10 K ); supply voltage is adjusted from 0.9 V to 1 V. An additional supply rail is also required in this work which we implemented using a resistive divider (a pair of 10 K memristors), and determined our TAND gate consumes 61.6% of the total power, and our TOR gate consumes 59.8%, which is close to twice the improvement.
Speed is arguably the biggest shortfall of memristor-based logic when compared to conventional-CMOS logic. This occurs because of the large bandgap of ions which takes substantially more energy (and therefore time) to reprogram memristors over creating an inversion layer in a transistor channel. Conventionally, one would calculate the RC delay to estimate propagation delay of a logic gate, but the switching time of memristors will outweigh the RC delay in advanced processes. While the transit-time in the 50 nm process is 9.489 ps, there is simply no memristor logic family that can compete with this, unless implemented as resistor-transistor logic. Some prior literature reports picosecond propagation delays, but these are simulation results which idealize the gate delay as a function of only RC delay, and do not account for switching limits of memristors [26], [27]. Nonetheless, our fabricated memristor-CMOS ternary logic was comeptitive with other memristor-CMOS implementations. As noted in Table V, our memristor switching speed is approximately 30 ns. The switching speed in Ref. [35] is 400 ns, showing our work is an improvement by a factor of 13 over the only other device implementation of ternary memristor logic.

VI. CONCLUSION
To our knowledge, this is the first hardware implementation of a complete memristor-CMOS ternary logic family. Therefore, we have provided sufficient area, power and speed metrics to enable future memristor-CMOS implementations to generate a figure-of-merit for future comparison. Our work demonstrates an implementation with extremely high data density. We have optimized for speed at the device level which outperforms all other ternary logic simulations using memristors, but it also highlights shortcomings in power and speed when compared to conventional CMOS. The high-density gates we have designed can advance the state-of-the-art in multi-level applications, such as storage class memory which demands high data density, and in ternary content addressable memories which perform a look-up operation to query between three different memory states. In the interest of reproducibility, all simulation data is available online by following the link provided in Ref. [31]. Chih-Yang Lin received the bachelor's degree from