Cross-Layer Reliability Modeling of Dual-Port FeFET: Device-Algorithm Interaction

The Ferroelectric Field-Effect Transistor (FeFET) is an emerging Non-Volatile Memory (NVM) technology enabling novel data-centric architectures that go far beyond von Neumann principles. Nevertheless, FeFET devices exhibit significant variations that can severely restrict their applicability. Temperature further exacerbates variation effects because it degrades ferroelectric parameters. Hence, it is indispensable to investigate and model design-time variations, run-time variations, and stochastic variations due to spatial fluctuation of ferroelectric domains under different temperatures. Dual-port FeFET has been recently proposed and demonstrated as a novel structure that offers for the first time disturb-free read operation along with $>\,\,\mathrm {10\,\,\times }$ larger memory window (MW) compared to conventional FeFETs. However, all the before-mentioned variations are amplified in such a new structure. This work analyses the impact of temperature variation for dual-port FeFETs for the first time in a cross-layer manner starting from the device level to the circuit/system levels, and compared to conventional FeFET. Through our cross-layer framework, we demonstrate the severe impact of variation on FeFET reliability despite the significant increase in the MW that dual-port FeFET offers. Even Hyperdimensional Computing is affected, despite its remarkable robustness against errors. All in all, our work reveals that a larger MW at the device level does not necessarily translate to benefits at the application level. Hence, investigating and modeling variability effects in a cross-layer manner is indispensable.


I. INTRODUCTION
T HE rapid growth of data-intensive applications like Machine Learning and Artificial Intelligence (AI) demand frequent data transfer between the compute engines and memory units. This seriously challenges the existing von Neumann based computing system because of the energy-intensive and frequent off-chip memory accesses. Compute-in-Memory (CiM) has been proposed to overcome this bottleneck in which the computation is efficiently performed within the memory itself [1]. CiM using the traditional CMOS technology like SRAM has a considerable area and leakage overheads [1]. Numerous initiatives have been made to resolve these challenges, including the use of emerging Non-Volatile Memory (NVM) technologies such as spintronic [2], resistive RAM (ReRAM) [3], and Ferroelectric FET (FeFET) [4] to create compact CiM designs. CiM-based on ReRAM and spintronic devices suffer from inefficient performance, poor integration with existing CMOS technologies, and high energy due to the current-driven write methods [1].
FeFET has been proposed as one of the emerging NVM technology that can overcome these challenges [5]. Excellent CMOS compatibility owing to the discovery of ferroelectricity in HfO 2 , voltage-based read and write operation that requires much lesser energy, and a higher ON/OFF current ratio make it a much better choice to implement ultra-efficient CiM. However, there are limitations and challenges of FeFET that must be overcome before its widespread adoption, like poor endurance and read-disturb, small memory window (MW) that restrict its use.
Recently proposed asymmetric double-gated "dual-port" FeFET can alleviate a few of these problems [6]. Compared to the conventional single-port FeFET, where the memory states are read by applying a voltage to the front gate (FG), dualport FeFET works by reading the memory states from the back gate (BG) of the transistor. This allows us to have a much larger MW (>10 × ) due to the capacitive coupling of the FG and BG, as well as improved retention and read-disturb free operation because of the separation of the read and write gates [6], [7], [8]. However, dual-port FeFET comes with its own challenges. In [7], we demonstrated that reading states from the BG compounds transistor variability despite the increase in the MW.
The increased variability can hamper the basic functionality of the circuits and systems [9]. Another considerable challenge for the FeFET transistor, which has been marginally investigated, is the run-time variability caused due to fluctuations in temperatures. In [10], the authors identify degradation of FE parameters, such as polarization strength and coercive field, due to temperature increase. Therefore, it is crucial to examine the impact of variability on the reliability of FeFET-based applications, especially CiM systems.
Prior works have shown that the reliability of FeFET can impair the performance of dependent systems [11], [12]. In light of this, Hyperdimensional Computing (HDC) is a newly-developed Machine Learning algorithm that runs on this unreliable hardware based on CiM architectures [13], [14]. This is made possible by calculating a Hamming distance (HD) using Ternary Content-Addressable Memory (TCAM) arrays within the memory itself. Nevertheless, none of the prior works includes all the different sources of variation and does not consider their impact on a robust algorithm like HDC. Further, dual-port FeFET has to be compared to single-port FeFET in the presence of variability to evaluate the benefits at the application level.
Our main contributions within this paper are as follows: (1) At the device level, we implement a novel compact model for the newly-proposed dual-port FeFET and validate it against well-calibrated TCAD data.
(2) At the circuit level, we implement a TCAM cell for the first time using dual-port FeFETs and then investigate its functionality.
(3) At the architectural level, we implement an array of TCAM cells that performs CiM to calculate HD similarities efficiently. (4) At the system level, we evaluate how run-time variability (due to temperature) along with design-time variability (due to process variation) results in errors in the performed CiM. (5) At the algorithm level, we evaluate how the aforementioned induced errors impact the inference accuracy of the HDC algorithm, which is a novel brain-inspired Machine Learning algorithm that holds the promise of being robust against errors. Fig. 1 demonstrates our cross-layered framework. From bottom to top, we have three layers; device, circuit and system that are linked together to enable accurate reliability estimations.
The first layer consists of the device-level simulations. Here, we have simulated the dual-port FeFET structure in the Synopsys Sentaurus TCAD and extracted the variations in the threshold voltage (V TH ) due to design-time variability, run-time variability and stochastic variations due to spatial fluctuation of ferroelectric domains under different temperatures. TCAD tools for device simulations are accurate but slow. Therefore, they are not suitable for large circuit implementation. Hence, to speed up our simulations, we developed the required compact model of dual-port FeFET and validated it with the TCAD data to perform accurate circuit analysis using SPICE simulations.
The second layer consists of the circuit and architectural level analysis. Here, we have used our FeFET model to design the TCAM cells and simulated them with the industry-standard Overview of our reliability framework, which connects the device to the system. It shows how our investigation is done across three layers (i.e., device, circuit and system levels) in addition to an outermost layer representing the system level.
SPICE simulator Cadence Spectre. The simulation time of SPICE is significantly faster than of TCAD. Further, we implement an array of TCAM cells that performs CiM to calculate HD efficiently. Then, the confusion matrix, which consists of the probabilities of correctness between the input and output HDs due to the variations in the device, is calculated. For this, we ran 1,000 Monte-Carlo simulations in SPICE and extracted the probability matrix of the HD using Python scripts from the raw data.
The third layer represents the system-level analysis. The calculated confusion matrix of the HD for each block is then used directly to calculate the loss in the inference accuracy of the HDC algorithm. The algorithm is developed in C++ and emulates the HDC inference step based on CiM architecture. The C++ implementation is embedded in a python wrapper to automate the analysis of different HDC parameters and processing of the different confusion matrices from the circuit configurations.

II. BACKGROUND AND PRELIMINARIES
The cross-layer framework of modeling starting from a single device up to the application level is shown in Fig. 2. In this section, the fundamental idea of each level is introduced, starting with FeFET technology at the device level, FeFETbased TCAM cell, and lastly, HDC as a Machine Learning application.
A. Conventional Single-Port Ferroelectric FET Fig. 3(a) shows the structure of FeFET, which is similar to a MOSFET, with the exception of an FE material integrated into the gate stack. Table I    and compatibility with modern foundry processes [16]. The HfO 2 -based FE layer is composed of multiple domains [16]. A domain is a region with an equivalent polarization, either up or down. Polarization describes the orientation of the permanent electrical dipoles that arise from the non-centrosymmetric arrangement of atoms within the FE material. The polarization of the domains results in a surface charge (Q FE ) at the HfO 2 -SiO 2 interface. Q FE is positive if the domains are polarized down and negative if the domains are polarized up.
In an n-type FeFET transistor ( Fig. 3(a)), a positive Q FE attracts more channel electrons. The positive Q FE within the FE layer is generated after applying a positive pulse (see Fig. 3(c)) at the front gate (FG) of the FeFET. Consequently, the channel conductance increases and the V TH decreases, setting the FeFET in a so-called low-V TH (LVT) state. Conversely, for up polarized domains, electrons are repelled from the channel. Thus, the channel conductance decreases, and hence, the V TH increases, setting the FeFET in a so-called high-V TH (HVT) state. A memory cell can be realized with these two distinct V TH states and MW = HVT -LVT. Fig. 3(d) shows the TCAD calibrated model for the 14 nm technology node FDSOI transistor. We have used Synopsys Sentaurus TCAD tool for the simulations [17]. This is used as the baseline transistor that is fitted with experimental data [15]. For the FeFET, the high-κ dielectric is replaced by the FE HfO 2 layer. FE parameters, such as saturation polarization (P s ), remnant polarization (P r ) and E c , are calibrated using the measured Q FE -V FE data from the metal-ferroelectric-metal capacitor [18]. The gate width (W g ) and gate length (L g ) of FeFET are both 100 nm for this work.
To write the binary state into such a memory, the V TH of the FeFET is set into HVT and LVT by applying a voltage pulse to the FG of the FeFET. A positive pulse larger than the coercive voltage V c , along with sufficient pulse width changes the polarization to the downward direction, which sets the FeFET into the LVT state. A negative pulse sets it into the HVT state. To read the states from memory, a small read voltage (0.7 V) is later applied at the FeFET gate, and the current is sensed. As shown in Fig. 3(e), the drain-source current I DS can either be high when the FeFET is in the LVT state or low in the HVT state. The MW for the FG read is 1.8 V. The work function engineering is done in the range of 4.1 eV to 4.9 eV for the TiN metal gate which is experimentally shown in [19] for the hysteresis shift. The novel dual-port FeFET is later explained in Section III

B. FeFET-Based Ternary Content Addressable Memory
In traditional memory architectures, a memory address is used as the search key, and the content of the memory at that address is returned as the search result. In contrast, Ternary Content Addressable Memory (TCAM) is searched using the memory content, and the result is the memory address of the data. It also facilitates the parallel search operation, which accelerates these searches. TCAMs have been proposed to realize an Associative Memory (AM) [20] and are currently used in CPU caches or network routers [21], [22]. A single TCAM cell is typically implemented using two SRAM cells and access logic at a total of 16 CMOS transistors. In contrast, FeFET-based TCAM designs need only two FeFETs because of their inherent non-volatility [20]. Fig. 4 depicts the n-type FeFET-based TCAM cells [23], where two FeFETs are connected in parallel. The FG of both FeFETs is used as the write port (WP) as well as the read port (RP). The TCAM cell is programmed by writing both FeFETs in a complementary manner. Logic "1" (Fig. 4(a)) is written into the cell by setting M1 into HVT state through a negative  voltage pulse (−4 V) to node WP/RP and M2 into LVT state with a positive pulse (4 V) to node WP/RP. Similarly, logic "0" is written into the cell by setting M1 and M2 in LVT and HVT, respectively ( Fig. 4(b)). To read the state of the TCAM cell, the matchline (ML) is precharged to V DD through a ptype transistor, then read voltages are applied to nodes WP/RP and WP/RP. The read voltages are chosen such that their value lies between the LVT and the HVT of the FeFET. In case of a match between the write and read states, the ML stays at its precharged value as none of the FeFETs is conducting. For example, if the TCAM cell stores logic "1", M1 is in HVT, whereas M2 is in LVT. For the match case, a read voltage (0.7 V) is applied at the node WP/RP and 0 V at node WP/RP. As M1 is in HVT, 0.7 V at its FG (WP/RP) does not conduct any current. M2 also does not conduct due to 0 V at its FG (WP/RP). Since both FeFETs are not conducting, ML does not discharge and indicates a match. Conversely, for the mismatch case, 0 V at node WP/RP and 0.7 V at node WP/RP which makes M2 conduct and ML discharges through it, indicating a mismatch.

C. TCAM Array
The TCAM cell is connected in an array to form a b-bit block by sharing the ML as shown in Fig. 5(a) [24]. Such a block's search capabilities can be used to compute a HD over a large array quickly and effectively in a single operation [25]. The number of discharge paths for the ML increases with the increased number of mismatches. Due to the increasing number of parallel discharge paths and the lower overall resistance, the discharge rate increases with the number of mismatches. By connecting a clocked self-referenced sense amplifier (CSRSA), the discharge rate of ML is translated into a separate sensing window to detect and differentiate the number of mismatches. Fig. 5(b) shows the schematic of CSRSA. When mismatches happen, the voltage SA OUT at the CSRSA's output drops sharply. The time interval between the edge of the CSRSA's enable signal and the output voltage decreasing below 10 % V DD is known as the operation latency. The TCAM block computes the HD based on measured latency. The result of the CSRSA for one to eight mismatches (i.e., HD) is displayed in Fig. 5(c). The margin between the latencies gets smaller as the number of mismatches increases.

D. Brain-Inspired Hyperdimensional Computing (HDC)
HDC is an emerging alternative to traditional machine learning methods [26], [27], [28]. In contrast to big neural networks, HDC uses high dimensional vectors, typically with D=10 000. Individual hypervector components can be of various data types, such as simple bits, integers, or real numbers [29]. Hypervectors are generated randomly and represent real-world data, such as voltage levels, pixels, or alphabetic letters. Multiple hypervectors are combined with three basic operations to map complex data into hyperspace. The implementation   of each operation is determined by the data type of the components.
The first operation, bundling, combines several hypervectors into a single hypervector of same dimension. Each bit in the resulting hypervector is determined by the majority operation. In the subsequent step, referred to as binding, two hypervectors are XORed to connect them together. Third, while performing the permutation operation, the hypervector rotates, which is useful for encoding sequences. Because each element is independent, all computations can be performed in parallel.
For example, to encode a text into hyperspace, first, each letter is assigned to a unique, randomly generated hypervector. The hypervector is stored in the Item Memory (IM). The IM is then used to map the first n letters from the text into hyperspace. To encode its position, the i-th hypervector is permuted i times. The permuted hypervectors are bundled to form a single hypervector representing the n-gram. These steps are repeated until such n-grams cover the entire text. The procedure is repeated for various texts in various languages. The AM stores the connection between each hypervector and the language of its text.
To classify the language of an unknown text, the text is first encoded into a query hypervector using the aforementioned procedure. The similarity between the query hypervector and the stored hypervectors in the AM is calculated. The similarity metric for binary hypervectors is the HD. The bits of the query and stored hypervector are compared. If they do not match, the distance is increased by one. The HD indicates how similar two hypervectors are to one another. The classification result is the language corresponding to the class hypervector with the lowest HD to a given query vector.

III. DUAL-PORT FERROELECTRIC FET
The conventional FG read of HfO 2 -based FeFETs cannot have a high MW (Eq. (1)) given that t FE = 10 nm [30]. For sufficiently high P FE , the MW is often approximated by a simplified expression [30]: where α is an ideality factor for the FeFET that accounts for second-order effects and is generally lower than 1. E c is the coercive field, and t FE is the thickness of the FE layer. Considering the typical values for HfO 2 films are t FE = 10 nm and E c = 1 MV cm −1 , the maximum MW of the FeFET is 2 V. As the MW is directly proportional to the t FE , a thicker FE oxide is required to realize a wider MW. However, FE properties degrade as t FE increases. This hinders device scaling [31], consequently placing FeFETs at a disadvantage compared to other matured technologies, like flash memory, which offer a significantly large MW. Most notably, executing write and read operations from the same terminal in traditional FeFETs can flip the polarization in the gate stack during the read operation. This can cause read disturb. To mitigate the aforementioned issues, asymmetric double-gated FeFETs with dual-port have been proposed [6]. With a t FE = 10 nm, the dual-port FeFET shows a MW of 12 V. Additionally, it provides a disturb-free read due to separated read and write ports. In such an asymmetric double-gate FeFET, the writing of memory states is carried out by applying a write pulse to the FG of FeFET that consists of the FE layer in its gate stack. Whereas the memory states are read by applying a voltage to the back gate (BG), which has a non-FE dielectric layer as shown in Fig. 3(b). In such an arrangement, the FG acts as the "write port", while the BG acts as the "read port". Hence, a "dual-port" FeFET memory cell is realized which is completely read disturb-free, contrary to conventional singleport FeFET. In other words, P FE is controlled through the FG, and the MW amplification stems from the BG read scheme. Applying a voltage to the BG changes the V TH of the FeFET for the FG read, and vice versa. This happens due to the electrostatic coupling between the BG and FG (also referred to as the body effect) [32]. The drain current of an asymmetric double-gate FeFET is measured by applying a voltage to the BG. The polarization in the FE modulates the V TH of the FeFET. The MW for the BG read (MW BG ) is given by [8]: where MW FG is the MW when FG is read, and γ BF is the body effect factor for the electrostatic coupling between FG and BG. γ BF is defined as the ratio of the equivalent series capacitance of BG, the channel, and the capacitance of the FG. In the case of the BG read, the electron channel is created close to the buried oxide layer (BOX) [7]. Generally, the BG oxide is thicker than the FG oxide, which results in γ BF greater than 1 and hence, MW BG increases. Using the Preisach model, the FeFET is simulated in TCAD by applying a pulse amplitude of 4 V. Fig. 3(e) shows the I-V characteristics of FeFET by sweeping the FG voltage (V FG ), whereas Fig. 3(f) shows the I-V characteristics of FeFET by sweeping the BG voltage (V BG ). We observe a tenfold amplification in the MW for the BG read. These results are indicated for 100 % P FE+ i.e., all the domains in the FE layer are polarized up for the HVT case and down for the LVT case.
However, not all the domains of the FE layer switch at the same time [33] and that leads to intermediate V TH states. Intermediate values of P FE provide intermediate V TH values. Fig. 6(a) shows the different levels of polarization with intermediate V TH values between HVT and LVT for both FG read and BG read. The total strength of polarization (%P FE+ ) in the FE layer is determined by the magnitude of the write voltage (WV) at the FG, assuming a constant pulse width. The relationship between WV and %P FE+ is shown in Fig. 6(b) for a pulse width of 2 µs. The intermediate V TH states can be employed as a trade-off for write energy reduction by reducing the WV or for the utilization of FeFETs as multi-level memory cells.

A. Our SPICE Modeling and Validation of FeFET
To simulate large-scale systems and circuits, it is necessary to have a computationally fast and reliable model for the underlying device. Although accurate, TCAD models are resource intensive and thus unsuitable for large-scale circuits and systems simulations. This necessitates a computationally fast and reliable compact model of a FeFET [34], [35].
To simulate the FeFET, we have combined the Preisach model-based FE capacitor with the FDSOI transistor. The Preisach model is a macro-level model that captures the Polarization (P) -Voltage (V) characteristics [36]. To capture the switching characteristics, an auxiliary voltage (V aux ) is defined. This represents the actual voltage to which the ferroelectric dipoles respond after relaxation: where τ v denotes the relaxation time for V aux . V in is the applied input voltage at the front gate of the FeFET. The corresponding polarization (P aux ) is given as: where V c is the coercive voltage of the FE material, P s is the saturation polarization, P r is the remnant polarization, m is the slope of the curve, and P off is the offset polarization. In Eq. (4), w is calculated using P r , P s and V c [37]. The local maxima and minima of the input voltage, termed as turning points, are calculated to mimic the effect of polarization history. An R-C delay network is used to calculate a delay voltage, which is compared with V aux to determine whether the input voltage is turning up or down. Instead of using direct differentiation of the input voltage to calculate the turning points, our approach saves computation time and improves convergence. The time constant of the R-C delay is chosen to be adequately low (≪τ v ) to not affect V aux . In order to simulate the complete FeFET, we combine the FE capacitor with the gate of the FDSOI transistor in series. The effect of BG bias is automatically included in the BSIM model, which can be used to tune the V TH of the FDSOI transistor [38], [39]. This allows us to perform BG read, where the inversion layer is actually formed at the Channel/BOX interface as shown in Fig. 3(b). The overall FE-FDSOI is then modeled by solving the charge balance (Q MOS = P FE *W*L) and voltage balance (V TOTAL = V FE + V MOS ). The SPICE model is validated against well-calibrated TCAD data. First, the TCAD model of the underlying FDSOI nMOS transistor was validated against transistor data obtained from fabricated industrial devices [15]. Then, the FE capacitor was also calibrated in TCAD against experimental data obtained from fabricated FE capacitor as in [18]. Finally, the parameters for the FE material, and other FE capacitor model parameters along with baseline FDSOI, are calibrated in SPICE simulator Cadence Spectre for both the FG and BG read to get an excellent match with TCAD data (see Fig. 8). The trends observed in fabricated 22 nm FDSOI FeFET [6] such as the increased MW and reduced subthreshold slope in case of BG read is well-captured using our model.

B. Dual-Port FeFET-Based TCAM Cell
The conventional FeFET consists of a single-port that is purposed as both an RP and WP, at its FG. By contrast, the dual-port FeFET has separate RP and WP, located at the BG and FG respectively. For the TCAM based on dual-port FeFET (Fig. 7(a) and Fig. 7(b)), "1"/"0" is written by setting the M1 to HVT/LVT and M2 to LVT/HVT by applying the negative/positive voltage to node WP and positive/negative voltage to node WP. However, the read operation is done by applying a read voltage of 10 V and 0 V to the separate nodes RP and RP. A read voltage is applied that lies between the LVT and HVT value of the FeFET. In case of a match, none of the FeFETs are conducted and the ML stays at V DD , which represents logic "1". Whereas, for the mismatch case, one of the FeFET conducts and discharges ML to ground, which represents logic "0".

IV. JOINT VARIABILITY MODELING OF FEFET
Variability in FeFETs severely affects the reliability of FeFET-based systems and the accuracy of the inference tasks that are being executed. In this section, we consider the sources of variability such as design-time variability, run-time variation, and inherent spatial variation of FE domains [7], [11], [40], [41]. The effect of these variability sources on FeFET-based circuits and systems is demonstrated in the following.

A. Intrinsic and Extrinsic Variations in Ferroelectric FET
FeFET variations can be detrimental to its operation as a memory. Stored states can be read incorrectly due to variations. Intrinsic ferroelectric switching variations cause a serious challenge for the FeFET, like spatial variations resulting from partial polarization at intermediate V TH values. The up and down polarized domains exist side-by-side and are randomly distributed across the FE layer above the channel. Therefore, the underlying channel exhibits variation in the potential caused by spatial fluctuation of the polarized domains, leading to variations in V TH [7]. Fig. 9(a) shows V TH distributions for 0 % P FE+ to 100 % P FE+ for the FG read FeFET. The maximum variation in the FeFET can be seen for 50 % P FE+ , in which maximum spatial variation of the domains is present. The minimum variation can be observed at the two extremes of 0 % P FE+ and 100 % P FE+ , in which no spatial variation of the domains is present. The variation at 0 % P FE+ and 100 % P FE+ is caused solely by the extrinsic variations (like design time variation) in the underlying transistor. The transistor suffers from conventional variability sources of deeply scaled transistors such as line edge roughness (LER), metal work function variation (WFV), and random dopant fluctuations (RDF). In the case of dual-port FeFETs, as the MW is amplified when reading from the BG, the variability is also increased compared to the FG read [7]. Fig. 9(b) shows the V TH distributions for 0 % P FE+ to 100 % P FE+ for the BG read at 27°C. It can be observed that the variation for the BG read is higher than the FG read. The larger BOX thickness, which is responsible for the amplification of MW, also causes the amplification in variation. Also, the LER, WFV, and RDF are higher in the case of BG read [7]. With decreased polarization strength, better compatibility with existing VLSI designs is obtained due to write voltage reduction and increased speed of operation. However, variability increases for partially polarized states.

B. Run-Time Variability
Another critical challenge for the FeFET transistors is the run-time variability due to an increase in temperature. As temperature increases, FE parameters like P r and E c decrease, which also results in the reduction of MW [10]. The P r decreases when temperature increases from 27°C to 85°C, which results in decrease of P FE+ . At 27°C, P r = 23.9 µC/cm 2 and at 85°C, P r = 15.33 µC/cm 2 [10]. Fig. 9(c) and Fig. 9(d) show the V TH distributions for 0 % P FE+ to 100 % P FE+ for the FG read and BG read respectively at 85°C. As compared to the individual FG and BG read at 27°C, variations increase at higher temperature. Higher temperature additionally increases the transistor's variations owing to reduced read margins, and degrades the performance of algorithms at the system level [24]. Similarly, the impact on the performance of algorithms due to FeFET variability at different temperatures and different levels of polarization has to be evaluated.

A. Device-Level Analysis
Simulations are conducted with the industry-standard SPICE simulator Cadence Spectre. Variations in the underlying transistor impact the TCAM cell and its operating latency. The current in the FeFET varies according to variations in V TH . As a result, variations in the discharge rate also affect changes in the operation latency. For Monte-Carlo simulations, we have consider the (i) conventional variability sources of deeply scaled transistors such as line edge roughness, metal work function variation, and random dopant fluctuations, (ii) variation due to spatial fluctuation of ferroelectric domains, under different temperatures. Collectively, all the variation sources are considered and standard deviation (σ ) in the threshold voltage is calculated. Then, the σ value is passed to the 'DELVTRAND' parameter of BSIM-IMG model and 1000 Monte-Carlo simulations are done. The variation-free nominal operation latency is displayed in Fig. 5(c), while the results of Monte-Carlo SPICE simulations are shown as histograms in Fig. 10. The results are obtained for both FG and BG read at different %P FE+ and temperature.
As described in Section IV, variations are higher for the BG read FeFET compared with the FG read FeFET. FeFET variations for both FG and BG increase further at a higher temperature. The reason for the large amplification in variation is due to the increase in the effective channel capacitance that increases the amplification factor γ BF in case of dualport FeFET. Fig. 10(c) shows that the overlap in the operation latency for the BG read FeFET-based TCAM is more than for the FG read FeFET-based TCAM (Fig. 10(a)) due to increased variations in the transistor. These overlaps in operation latency result in an incorrect evaluation of HD. For 70 % P FE+ , operation latencies overlap is greater than at 100 % P FE+ due to the addition of random spatial fluctuation of ferroelectric domains. A similar trend is observed for the overlap in operation latency at higher temperatures ( Fig. 10(e) to 10(h)).

B. Circuit-Level Analysis
The variations in the devices affect the TCAM array output results. The overlap in the operation latencies increases due to higher variation in the FeFET. Fig. 11 shows the confusion matrices between the input and output HDs. The input HD Fig. 13. System-level Analysis: Loss in inference accuracy of HDC for the language dataset classification in the case of block size and precision of 8 bits. (a) Conventional single-port FeFET with 100 % P FE+ has the average accuracy loss 0.2 %, whereas (b) for 70 % P FE+ it is 0.3 % at dimension of 5000 bits (c) for the dual-port FeFET with 100 % P FE+ , the loss of 0.4 % at dimension 5000 bits is obtained. However, (d) with 70 % P FE+ , dual-port FeFET has the average loss of at least 10 %, even at higher dimensions. (e) The inference accuracy loss at 85°C for the conventional FeFET with 100 % P FE+ and (f) 70 % P FE+ converge at 0.3 % and 0.6 % at a dimension of 6000 bits respectively, whereas (g) for the dual-port FeFET with 100 % P FE+ , 0.7 % of accuracy loss at 6000 bits is obtained. (h) with 70 % P FE+ , dual-port FeFET has the average loss of at least 12 %, even at higher dimensions. So, only when 70 % P FE+ is employed for the dual-port FeFET, the accuracy is lost.
is determined for the variation-free FeFETs, whereas the output HD is calculated for the variation-affected FeFETs. The probabilities accounted in the confusion matrices determine the percentage of correctness between the input and output HD. Comparing FG and BG read FeFET-based TCAM at 27°C, given a similar input HD, FG read has the higher probability for the output HD as shown in (Fig. 11(a) and 11(c)). For the case of BG read FeFET-based TCAM at 70 % P FE+ (Fig. 11(d) and 11(h)), the probability between the input and output HD is widely distributed which infers completely wrong evaluation of HD. In the case of runtime variation due to an increase in temperature, the correct operation of the TCAM cell is also affected. These probabilities are the simplified metric for visualization. It accounts for variations from the nominal operation latency that are less than half the distance to the neighboring HD.

C. System-Level Analysis
The main process of an HDC inference is the HD computation. The HD of an entire vector cannot be calculated in a single block. It is very difficult to differentiate HD over 8 bits [42], as illustrated in Fig. 10. Increasing the number of HD bits, the overlap in the distribution of operating latency increases which results in the wide spread of the diagonal probabilities in the confusion matrix impacting the circuit-level analysis. This results in the division of each class hypervector of size d into d/b TCAM blocks, each of which stores b bits as shown in Fig. 12. Each row of TCAM array stores the language class hypervector. Similar divisions are made in the query hypervector, and these divisions are then applied to the TCAM blocks. Since each block has its own peripheral for detecting the HD, all computations for the entire AM are performed in parallel. A dedicated circuit also uses less energy than a general-purpose CPU.
Inference accuracy loss for the language recognition dataset is shown for all tested configurations (P FE , temperature, read port) in Fig. 13. The larger impact of variations for BG read FeFET (Fig. 13(c)) is reflected in the results, specially at smaller dimensions. Inference accuracy loss can be as high as 1.8 % (on average 1.2 %) with a dimension of 1000 bits. In contrast, the accuracy loss for FG read (Fig. 13(a)) is 0.2 % at 27°C. More redundant bits are present in larger dimensions to mitigate the effects of erroneous HD computations. At a dimension of 5000 bits, the loss stabilizes around 0.4 % for BG read.
As the impact of variations for 70 % P FE+ is stronger, the results in Fig. 13(b) and 13(d) show the inference accuracy loss for FG and BG read, respectively. At a dimension of 1000 bits, the average loss is 1.7 % for the FG read whereas, it is extremely high for the BG read (around 40 %). The accuracy loss converges at a dimension of 5000 bits, with 0.3 % loss for the FG read. However, for the BG read with 70 % P FE+ , the loss is at least 10 %, even at higher dimensions.
The investigation of inference accuracy loss for the higher temperature is essential to see the impact of runtime variation in the FeFETs. Fig. 13(e) and 13(h) show the inference accuracy loss for FG and BG read at 100 % P FE+ and 70 % P FE+ at 85°C. The inference accuracy loss is stabilized at a dimension of 6000 bits. The accuracy loss is 0.3 % and 0.6 % for 100 % P FE+ FG read and 70 % P FE+ FG read respectively. Further, the accuracy loss is 0.7 % for 100 % P FE+ BG read. However, for BG read with 70 % P FE+ , the loss does not stabilize even at higher dimensions in case of higher temperature. In summary, BG read, partial polarization, and temperature rise magnifies the inherent loss in inference accuracy by employing TCAM for HD computations.
The tradeoffs are made to speed up the simulations at the system level between the speed of simulations and the accuracy. The system-level analysis is done at a relatively higher speed compared with the SPICE simulation speed due to our integrated framework. However, there will be accuracy losses if we convert the device-level behaviours at the system level due to parasitic effects as well as the layout of the circuit as these effects are not captured in our framework yet.

VI. CONCLUSION
In this work, we have demonstrated the effect of variations like design-time variations, run-time variations, and stochastic variations due to spatial fluctuation of FE domains under different temperatures for dual-port FeFETs and compared them with the conventional single-port FeFETs. Consequently, its effects on circuits (TCAM) and systems (HDC) are analyzed under V TH variation. The results suggest the advantages and disadvantages of using dual-port FeFET-based circuits and systems. The inference accuracy loss of HDC based dual-port FeFET has a higher loss compared to the single-port FeFET at both 27°C and 85°C. We have also investigated the effect of a reduced polarization strength (70 % P FE+ ) on the inference accuracy of HDC. Using a single-port FeFET, a reliable operation of inference in HDC is obtained. This is in contrast to dual-port FeFET, where the inference accuracy loss is unacceptably higher and does not converge even for large hypervectors at both 27°C and 85°C. Therefore, when the dual-port FeFET with 70 % P FE+ is employed (i.e., a weak write voltage), then the behaviour of the circuit and system breaks down and the accuracy of even robust HDC is lost. Future work would be to reduce the variability in the dualport FeFET. There should be new technique for the TCAM array circuit that includes more cells, so that number of bits per block is increased and our architecture becomes more compact.
Simon Thomann (Member, IEEE) received the bachelor's and master's degrees in computer science from the Karlsruhe Institute of Technology (KIT), Germany, in 2019 and 2022, respectively. He is currently pursuing the Ph.D. degree with the Chair of Semiconductor Test and Reliability (STAR), University of Stuttgart. His research interests range from device to system level. He focuses especially on circuit design, emerging technologies, and cross-layer reliability modeling from device to circuit level. Hussam Amrouch (Member, IEEE) received the Ph.D. degree (Hons.) (summa cum laude) from KIT in 2015. He is a Professor (W3) Heading Chair of AI processor design with the Technical University of Munich (TUM), Germany, and the Chair of Semiconductor Test and Reliability (STAR), University of Stuttgart, Germany. Prior to that, he was a Research Group Leader with the Karlsruhe Institute of Technology (KIT), where he was leading the research efforts in building dependable embedded systems. His research in HW security and reliability have been funded by the German Research Foundation (DFG), Advantest Corporation, and the U.S. Office of Naval Research (ONR). He has more than 200 publications in multidisciplinary research areas (including 83 journals) across the entire computing stack, starting from semiconductor physics to circuit design all the way up to computer-aided design and computer architecture. His main research interests include design for reliability and testing from device physics to systems, machine learning for CAD, HW security, approximate computing, and emerging technologies with a special focus on ferroelectric devices. He holds eight HiPEAC Paper Awards and three best paper nominations at top EDA conferences, such as DAC'16, DAC'17, and DATE'17, for his work on reliability. He has served in the technical program committees for many major EDA conferences, such as DAC, ASP-DAC, and ICCAD. He is a Reviewer of many top journals, such as Nature Electronics, IEEE TRANSACTIONS ON