Spin Wave Based 4-2 Compressor

By their very nature, Spin Waves (SWs) consume ultra-low amounts of energy, which makes them suitable for ultra-low energy consumption applications. In addition, a compressor can be utilized to further reduce the energy consumption and enhance the speed of a multiplier. Therefore, we propose a novel energy efficient SW based 4-2 compressor consisting of 4 XOR gates and 2 Majority gates. The proposed compressor is validated by means of micromagnetic simulations and compared with the state-of-the-art SW, 22nm CMOS, Magnetic Tunnel Junction (MTJ), Domain Wall Motion (DWM), and Spin-CMOS technologies. The performance evaluation shows that the proposed compressor consumes 2.5x less and 1.25x less energy than the 22nm CMOS and the conventional SW compressor, respectively, whereas it consumes at least 3 orders of magnitude less energy than the MTJ, DWM, and Spin-CMOS designs. Furthermore, the compressor achieves the smallest chip real-estate. In summary, the performance evaluation of our proposed compressor shows that the SW technology has the potential to progress the state-of-the-art circuit design in terms of energy consumption and scalability.


I. INTRODUCTION
Complementary Metal Oxide Semiconductor (CMOS) downscaling has been efficient to meet the exploding market requirements for highly efficient computing platforms that process the raw data resulting from the information technology revolution 1 . However, CMOS downscaling becomes very difficult as we approach the end of Moore's law because of the leakage, cost, and reliability walls 2 . Therefore, researchers have explored different technologies including spintronics 3 . One of the spintronic promising technologies is the Spin Wave (SW) technology because it has ultra-low energy consumption, acceptable delay, and high scalablility [4][5][6][7][8] . As a result, there is a strong interest in designing SW based circuits.
Researchers have designed different logic gates and circuits using SWs [5][6][7][9][10][11][12][13][14] . A Mach-Zehnder interferometer was utilized to build the first experimental SW NOT gate 9 . Afterwards, single output Majority, (N)AND, (N)OR, and X(N)OR gates were built using Mach-Zehnder interferometers 9 , whereas multi-output logic gates were suggested 7,10,12 . Moreover, multi-frequency logic gates were reported 6,11 . On a bigger scale, multiple circuits have been introduced at the conceptual 13 , simulational 5 , and also practical millimeter scale level 14 . To conclude, SW circuit design is still in its genesis stage. Therefore, the design, validation and demonstration of SW based circuits at different complexity scales is of great interest to progress SW computing.
Driven by the aforementioned information, we propose, validate, and assess a novel SW based 4-2 compressor consisting of 4 XOR and 2 Majority gates. In the following, we summarize the main contributions of the paper: • Designing a novel SW 4-2 compressor.
• Validating the proposed 4-2 Compressors by means of micromagnetic simulations.
• Demonstrating the compressor superiority by comparing its performance with the state-of-the-art SW, 22 nm CMOS, Magnetic Tunnel Junction (MTJ), Domain Wall Motion (DWM), and Spin-CMOS technologies. The evaluation results show that the proposed compressor consumes consumes 1.25x less energy than the conventional SW compressor, and 2.5x less energy than the 22 nm CMOS counterparts. In addition, it outperforms the MTJ, DWM, and Spin-CMOS designs by at least 3 orders of magnitude. Furthermore, it achieves the smallest chip real-estate.

2
The paper is organized as follows. We explain the SW background and computing paradigm in Section II. Next, we illustrate the proposed compressor in Section III, and present the simulation setup, results, and performance evaluation in Section IV. Section V concludes the paper.

COMPUTING PARADIGM
The magnetization dynamics in a ferro-or ferrimagnetic material is described by the where γ is the gyromagnetic ratio, µ 0 the vacuum permeability, M the magnetization, M s the saturation magnetization, α the damping factor, and H ef f the effective field consisting of the external field, the exchange field, the demagnetizing field, and the magneto-crystalline field.
For small magnetic disturbances, the LLG equation predicts wave-like magnetic motion. The SW amplitude and phase can be used to encode information at different frequencies 4,6 .
Moreover, the processing of this information is performed by the interference principle. For example, if two SWs with the same amplitude, wavelength, and frequency meet in the waveguide, they interfere constructively if they have the same phase, i.e. ∆φ = 0, and destructively if they have opposite phases, i.e. ∆φ = π. In addition, SWs naturally support Majority gates because the interference of an odd number of SWs is based on the Majority decision. For instance, if 3 SWs with the the same amplitude, wavelength, and frequency meet in the same waveguide, the interference result is a SW with phase 0 if at least 2 SWs have a phase of 0, whereas the interference result is a SW with phase π if at least 2 SWs have a phase of π. Note that such an implementation in CMOS technology requires 18 transistors whereas it can be directly implemented in SW technology 4 . In this paper, logic 0 corresponds to a SW with phase 0, whereas logic 1 corresponds to a SW with phase π.  phase detection and threshold detection. In the phase detection, the resultant SW phase is compared with a predefined phase, if its phase is 0, the output is logic 0, and otherwise, logic 1. On the other hand, in the threshold detection, the dynamic magnetization amplitude is compared with a predefined threshold, i.e., if the amplitude is larger than the predefined threshold, the output is logic 0, and otherwise, logic 1 4 .

III. SW 4-2 COMPRESSOR
The fast multiplier consists of three main stages: partial product generator, partial products reducer, and final production computer; the main part of the energy consumption and delay originates from the partial product stage. This can be optimized by utilizing a 4-2 compressor in the partial products reducer stage of the fast multiplier 15 . Therefore, we built a SW 4-2 compressor. and phase.
In order to ensure the correct functionality of the proposed 4-2 compressor, all SWs must be excited at the same amplitude, wavelength, and frequency. The SW wavelength must be larger than the waveguide width to simplify the interference pattern. Moreover, the structure must be designed carefully to guarantee the correct functionality of the compressor because the structure's dimension affects the interference results. For example, if constructive interference is required at the intersection point when the waves have the same phase and destructive interference otherwise, then the device dimensions d 1 ,d 2 ,d 3 ,d 5 ,d 6 ,d 7 ,and d 8 must equal to nλ where n = 0, 1, 2, . . . Note that this is the case in our design. The outputs C o1 and C o2 must be located at a specific position as they are based on phase detection. Hence, by changing its location, it is feasible to extract the inverted output or the non-inverted output. For example, if the desired result is to capture the non-inverted output, the distance d 4 must equal nλ which is the case for C o1 and C o2 . On the other hand, as the output S is detected based on threshold detection, the resultant SW is compared with a predefined threshold value as previously discussed. To detect the largest possible SW amplitude, the output S must be located as close as possible to the interference point, i.e., d 9 must be as small as possible.
The proposed 4-2 SW compressor works as follows: • Carry-out1 output C o1 : The SWs excited at X2 and X3 interfere constructively or destructively depending on their phase at the intersection point. Then the SW interference result propagates further through the waveguide and interferes with the SW excited at X1 at the intersection point between the waveguides. Finally, the resultant SW is captured at C o1 based on phase detection.
• Carry-out2 output C o2 : The SWs excited at X2 and X3 interfere constructively or destructively depending on their phase at the intersection point. After that, the resultant wave is received by repeater I1 which will excite a SW with a suitable phase depending on the received SW magnetization. If the received SW magnetization is larger than a threshold, a SW with phase of 0 will be excited, and a SW with phase of π will be excited, otherwise. Then, the SW excited from I1 interferes with the SW excited from X3. Next, the resultant SW will be received by the repeater I2 which will excite a SW with a suitable phase depending on the received SW magnetization at the intersection point between the waveguides. Meanwhile, the SWs excited from X4 and Ci will interfere at the intersection point. Finally, the resultant SW will interfere with the SW excited from I2, and the result will be captured by C o2 based on phase detection.
• Sum output S: The SWs excited from X4 and Ci will interfere at the intersection point between the two waveguides, and the result will be detected by repeater I3.
Next, repeater I3 will excite a SW with a suitable phase depending on the received SW magnetization as previously discussed. Finally, the output S will capture the results of the interference between SWs excited from I2 and I3 based on threshold detection.

A. Simulation Setup
We utilized the following parameters to validate the proposed structure by We excited the SWs with a 10 GHz Gaussian pulse with sigma of 500 ps to save energy, guarantee a single frequency SW excitation, and achieve high group velocity. The wavenumber k is determined from the SW dispersion relation, which makes the wavelength equals 6    Therefore, the micromagnetic simulation results demonstrated that the 4-2 SW compressor is functioning correctly.

Performance Evaluation
In order to assess the performance of the proposed 4-2 SW compressor and see the potential of such an approach, we evaluate it and compare it with the state-of-the-art SW,  We assessed the proposed SW compressor on an application level utilizing the JPEG compression algorithm to see the potential of such an approach in larger scale. In the JPEG algorithm 20 , DCT and IDCT can be implemented using the 4-2 compressor 19 . If we implemented the DCT and IDCT by means of the proposed 4-2 SW compressor, we expect to achieve ultra-low-energy consumption. As we discussed previously, the proposed compressor consumes 3 magnitude orders less energy than the Spin-CMOS counterpart which indicates that the DCT/IDCT based on the proposed 4-2 SW compressor will consume at least 3 orders of magnitude less energy than the DCT/IDCT based on the Spin-CMOS 4-2 SW compressor 19 . In this paper, our main goal is to propose and validate the SW compressor as a proof of concept without considering thermal noise and variability effects. However, in 21 , it was presented that the thermal noise, the edge roughness and the waveguide trapezoidal cross section do not have noticeable effects on the gate's functionality. Therefore, we expect that the thermal noise and variability will have limited effect on the compressor. Nevertheless, we will investigate such phenomena in the future.
It was shown that SW technology can be very effective and has the requirements to progress the state-of-the-art in terms of energy consumption and scalability. However, some open issues are still to be solved 4 . For example, although Magneto-Electric (ME) cells seem to be the right choice for the SW excitation and detection, their efficient behavior is not yet been experimentally realized. Moreover, although SW technology is highly scalable as the only limitation for a SW device scalability is the SW wavelength, the SW has not yet been distinguished from the noise at the nano-scale 4 . However, we are sure that the industry, as always, will find its way to efficient nanoscale SW devices and benefit from the SW computing paradigm. 801055.