Spin Wave Based Approximate Computing

Spin Waves(SWs) enable the realization of energy efficient circuits as they propagate and interfere within waveguides without consuming noticeable energy. However, SW computing can be even more energy efficient by taking advantage of the approximate computing paradigm as many applications are error-tolerant like multimedia and social media. In this paper we propose an ultra-low energy novel Approximate Full Adder(AFA) and a 2-bit inputs Multiplier(AMUL). We validate the correct functionality of our proposal by means of micromagnetic simulations and evaluate the approximate FA figure of merit against state-of-the-art accurate SW, 7nmCMOS, Spin Hall Effect(SHE), Domain Wall Motion(DWM), accurate and approximate 45nmCMOS, Magnetic Tunnel Junction(MTJ), and Spin-CMOS FA implementations. Our results indicate that AFA consumes 43% and 33% less energy than state-of-the-art accurate SW and 7nmCMOS FA, respectively, and saves 69% and 44% when compared with accurate and approximate 45nm CMOS, respectively, and provides a 2 orders of magnitude energy reduction when compared with accurate SHE, accurate and approximate DWM, MTJ, and Spin-CMOS, counterparts. In addition, it achieves the same error rate as approximate 45nmCMOS and Spin-CMOS FA whereas it exhibits 50% less error rate than the approximate DWM FA. Furthermore, it outperforms its contenders in terms of area by saving at least 29% chip real-estate. AMUL is evaluated and compared with state-of-the-art accurate SW and 16nm CMOS accurate and approximate state-of-the-art designs. The evaluation results indicate that it saves at least 2x and 5x energy in comparison with the state-of-the-art SW designs and 16nm CMOS accurate and approximate designs, respectively, and has an average error rate of 10%, while the approximate CMOS MUL has an average error rate of 13%, and requires at least 64% less chip real-estate.


I. INTRODUCTION
While in the last decades CMOS downscaling has been able to enable high performance computing platforms required to process the information technology revolution induced huge data amount 1 , it becomes very difficult to keep the same downscaling pace due to 2 : (i) leakage wall, (ii) reliability wall, and (iii) cost wall. This predicts that Moore's law will come to the end soon and, as a result, researchers have started to explore different technologies (e.g., memristors [3][4][5][6] , graphene devices [7][8][9] , and spintronics 10-13 ) among which Spin Wave (SW) stands apart as one of the most promising due to its [14][15][16][17][18][19][20] : (i) Ultra-low energy consumption -SW computing depends on wave interference instead of charge movements. (ii) Acceptable delay. (iii) Highly scalable -SW wavelengths can reach the nanometer range.
Driven by this potential to build energy efficient circuits, several SW based logic gates and circuits have been reported [16][17][18][19][20][21][22][23][24][25][26][27][28][29][30] . The Mach-Zehnder interferometer was utilized to build a SW NOT gate, which is considered as the first SW computing device 21 . Moreover, XNOR, (N)AND, and (N)OR gates were reported by making use of the Mach-Zehnder interferometer [22][23][24] . Whereas the Mach-Zehnder interferometer utilise SW amplitude to perform the logic operations, other devices utilize SW phase or both phase and amplitude to build fanout enabled Majority, (N)AND, (N)OR, and X(N)OR gates 19,20,25 . Moreover, SW frequency was utilised as an additional parameter to improve data storage and computing capabilities of multi-frequency Majority and X(N)OR gates 17,18 . In addition, physical realization of Majority gates were demonstrated [26][27][28] . Furthermore, SW circuits were proposed at conceptual level, i.e., without simulation or experimental results, 29 , at simulation level, 2bit inputs SW multiplier 16 and magnonic half-adder 31 , as well as simulation based practical mm range prototypes 30 .
All the aforementioned logic gates and circuits were designed to provide accurate results, whereas many current applications like multimedia processing and social media are error tolerant and, within certain bounds, are not fundamentally perturbed by computation errors 32 .
Therefore, such applications can benefit from approximate computing circuits, which can save significant amounts of energy, delay, and area, while providing acceptable accuracy. In view of this, this paper introduces novel energy efficient Approximate SW-based Full Adder (AFA) and Approximate 2-bit inputs Multiplier (AMUL), and its main contributions can be summarized as follows: • Developing and designing a SW based approximate FA: The proposed adder consists of one Majority gate and has a 25 % error rate.
• Developing and designing a SW based Approximate 2-bit inputs MUL: The proposed AMUL is implemented using 3 AND gates and has a 10 % error rate.
• Validation of the proposed AFA and AMUL circuits by means of the MuMax3 software.
• Demonstrating the superiority: The proposed approximate circuits performance is assessed and compared with accurate and approximate state-of-the-art design counterparts. Our results indicate that AFA consumes 43% and 33% less energy than accurate state-of-the-art SW and 7 nm CMOS counterparts, respectively, and saves 69% and 44% in comparison with accurate and approximate 45 nm CMOS, respectively. In addition, it saves more than 2 orders of magnitude in terms of energy when compared with accurate Spin Hall Effect (SHE) and Domain Wall Motion (DWM), accurate and approximate Magnetic Tunnel Junction (MTJ), and Spin-CMOS based counterparts. In addition, it achieves the same error rate as approximate 45 nm CMOS and Spin-CMOS FAs and 50% less error rate than the approximate DWM. Also, it requires at least 29% less chip real-estate in comparison with the other state-of-the-art designs. Moreover, AMUL saves at least 2x and 5x energy in comparison with accurate SW and 16 nm CMOS accurate/approximate designs, respectively, has an average error rate of 10%, while the approximate CMOS MUL has an average error rate of 12.5%, and requires at least 64% less chip real-estate.
The paper is organized as follows. Section II provides SW computing background. Section III introduces the proposed approximate circuits. Section IV presents the simulation setup and simulation results. Section V provides performance evaluation data and discusses variability and thermal noise effects and Section VI concludes the paper.

II. SPIN WAVE BASED TECHNOLOGY BASICS
We explain the SW basics and computing paradigm in this section.

A. Spin Wave Fundamentals
The Landau-Lifshitz-Gilbert (LLG) describes the magnetization dynamics caused by the magnetic torque when magnetic material magnetization is out of equilibrium 14 where γ is the gyromagnetic ratio, α the damping factor, M the magnetization, M s the saturation magnetization, and H ef f the effective field which contains the different magnetic interactions. In this work, the effective field is the summation of the external field, the exchange field, the demagnetizing field, and the magneto-crystalline field.
For small magnetic perturbations, Equation (1) can be linearized and results in wavelike solutions which are known as Spin Waves (SWs), which can also be seen as collective excitations of the magnetization within the magnetic material. Just like any other wave, a SW is completely described by its amplitude A, phase φ, frequency f , wavelength λ , and wavenumber k = 2π λ . The relation between frequency f and wavenumber k is called the dispersion relation and is very important for the design of the magnonic devices 14 .

B. SW Computation Paradigm
The SW amplitude and phase can be used to encode information at different frequencies, which enables parallelism 14,17 . The interaction between multiple SWs present in the same waveguide is based on the interference principle. Figure 1a) presents an example of interaction between 2 SWs excited with the same A, λ, and f in the same waveguide. If the 2 SWs have the same phase ∆φ = 0, they interfere constructively resulting in a SW with higher amplitude, whereas if they are out of phase ∆φ = π, they interfere destructively,  and logic 0 otherwise.

III. SW APPROXIMATE FUNCTIONS
In this section, we introduce and analyse SW-based Approximate Full Adder (AFA) and

2-bit inputs Multiplier (AMUL).
A. SW Approximate Full Adder Figure 2 presents the proposed Approximate FA (AFA) structure, which has 3 inputs X, Y , and C i , and 2 outputs S and C o and is a 3-input Majority gate that evaluates 33 . AFA generates C o without any error as it is detected as the Majority of X, Y , and C i , which is also the case in accurate FAs. On the   Table I presents FA and AFA truth tables, which clarifies that the approximate FA sum S ap is erroneous when all inputs are 0/1.
To achieve the AFA behaviour the design in Figure 2 has to be properly dimensioned.
The waveguide width must be smaller or equal to the SW wavelength λ and SW amplitude, wavelength, and frequency must be the same at every excitation cell. Furthermore, the structure dimensions must be precisely determined because the interference pattern depends on the location and distances between different excitation and detection cells. For example, if the constructive interference pattern is desired when the SWs have the same phase ∆φ = 0 and destructive when the SWs are out-of-phase ∆φ = π, d 1 , d 2 , and d 3 must be equal with nλ (where n = 0, 1, 2, 3, . . .). In addition, if the inverted Majority is of interest, which is the case for S, d 4 must be (n + 1/2) × λ and if the non-inverted output is required, which is the case for C o , d 5 must be nλ. The AFA operation principle relies on a combined process of SWs propagation and interferences as follows: First, SWs are excited at X and Y and propagate diagonally until they interfere constructively or destructively depending on their phases at the connection point. Then, the resulting SW propagates and interferes constructively or destructively with the SW excited at C i at the next connection point. This interference result generates the final SW, which travels toward the outputs and M AJ(X, Y, C i ) is detected at Its inputs are the 2-bit operands X = (X 1 , X 0 ) and Y = (Y 1 , Y 0 ) and its 4-bit output is . AMUL consists of 3 AND gates, which evaluate the AMUL outputs   To evaluate the error rate we note that in the accurate MUL the outputs bits are computed , AN D(X 1 , Y 0 )), AN D(X 1 , Y 1 )), and Q 3 = AN D(AN D(X 0 , Y 0 ), AN D(X 1 , Y 1 )), and present in Table II Table that AMUL computes Q 0 without any error, and Q 1 , Q 2 , and Q 3 with 31.25%, 6.25%, and 6.25% error rate, respectively. However if threshold based output detection is utilized the error rate for Q 1 and Q 3 can be reduced to 25% and 0%, respectively, as demonstrated in Section IV, which brings our proposal to an average error rate of 10%.
The previously mentioned design parameters hold true for the AMUL as well. However, in contrast to AFA, AMUL relies on threshold based output detection, which means that the detection cells must be as close as possible to the last interference point, thus d 4 , d 5 , d 6 , and d 7 should be minimized.

IV. SIMULATION SETUP AND RESULTS
The simulation setup and simulation results are provided and explained in this section.    Similarly, one can analyze Figure 6. For instance, the SWs magnetization for the in-    Using the same way, Figure 7 is analyzed. The SW magnetization for input combina-    Finally, Figure 8 is analyzed in the same manner. The SWs magnetization for input combination X 1 Y 1 X 0 Y 0 ={1111} is larger than 0.0014M s when reading them at time 2.76 ns,

2-bit inputs approximate MUL based on threshold detection
whereas the rest of magnetization are less than 0.0014M s . Therefore, if the threshold is set to be 0.0014M s Q 3 can be obtained with 0% error rate.

V. PERFORMANCE EVALUATION AND DISCUSSION
In this section, the proposed AFA and AMUL are evaluated and compared with the state-of-the-art designs. Furthermore, the variability and thermal noise effects are discussed in addition to some open issues related to SW technology.

Performance Evaluation
To The AFA delay is calculated by adding ME cell delay to the SW propagation delay through the waveguide determined by means of micromagnetic simulation and equals to 1.84 ns.

Variability and Thermal Effect
In this paper, the main target is to propose and validate by means of micromagnetic simulations the approximate FA and MUL as proof of the concepts without considering the impacts of the thermal noise and the variability. However, it was reported that the thermal noise has limited effect on the gate function and consequently the gate works correctly at different temperature 45 . In addition, the effect of the edge roughness and the waveguide trapezoidal cross section were demonstrated 45 . It was suggested that both effects are very small and the gate operates correctly at their presence as well 45 . Therefore, we don't expect neither the thermal noise nor the geometrical variability to have large impact on the proposed circuits. However, we plan to investigate these phenomena in the future.

Discussion
Although the evaluation demonstrated that the SW technology has the needed requirements to improve the state-of-the-art in terms of energy as well as area consumption, but a number of open issues are still to be solved 14 : • Immature technology: It seems that the ME cells are the right option to excite and detect the SW because of their ultra low energy consumption, acceptable delay and scalability. However, ME cells are not realized experimentally until now.
• Scalability: In terms of area SW circuit have a great scaling potential as for proper functionality SW device dimensions must be greater or equal than the SW wavelength, which can reach down to the nm range. Several SW circuit area benchmarkings have been reported 42 which indicate that hybrid spin-wave-CMOS circuits have very small area. Although the assumptions the benchmarking is based on might not be fully realistic, they give an indication regarding the expected area. For example, the area of a 32-bit divider (DIV32) implemented in hybrid SW-CMOS is roughly about 3.5x smaller than the one of the 10 nm CMOS counterpart. However, few things are needed before being able to realize nano-scale SW device such as excitation and detection: currently, it is not possible to distinguish nm SWs from noise.

VI. CONCLUSIONS
We proposed and validated by means of micromagnetic simulations a novel approximate energy efficient spin wave based Full Adder (AFA) and 2-bit inputs multiplier (AMUL).
Both designs were evaluated and compared with the state-of-the-art counterparts. AFA saves 43% and 33% energy when compared with the state-of-the-art SW and 7 nm CMOS, respectively, and 69% and 44% in comparison with accurate and approximate 45 nm CMOS, respectively. In addition, it saves more than 2 orders of magnitude when compared with accurate SHE, and accurate and approximate DWM, MTJ, and Spin-CMOS FAs. Moreover, it achieves the same error rate as approximate 45 nm CMOS and Spin-CMOS FA whereas it exhibits 50% less error rate than approximate DWM FA and requires at least 29% less chip real-estate in comparison with the other state-of-the-art designs. At its turn AMUL saves at least 2x and 5x energy in comparison with the state-of-the-art accurate SW designs and 16 nm CMOS accurate and approximate designs, respectively. Moreover, the AMUL has an average error rate of 10%, while the approximate CMOS MUL has an average error rate of 12.5%, and requires at least 64% less chip real-estate.