Ferrimagnetic Synapse Devices for Fast and Energy-Efficient On-Chip Learning on An Analog-Hardware Neural Network

we have modeled domain-wall motion in ferrimagnetic and ferromagnetic devices through micro magnetics and shown that the domain-wall velocity can be 2–2.5X faster in the ferrimagnetic device compared to the ferromagnetic device. We also show that this velocity ratio is consistent with recent experimental findings Because of such a velocity ratio, when such devices are used as synapses in the crossbar-array-based fully connected network, our system-level simulation here shows that a ferrimagnet-synapse-based crossbar offers 4X faster (for the same energy efficiency) or 4X more energy-efficient (for the same speed) learning when compared to the ferromagnet-synapse-based crossbar.


I. INTRODUCTION
Spin-orbit-torque-driven domain-wall motion has been proposed as a working principle for ferromagnet-based nonvolatile memory devices, like the racetrack memory [1]- [4]. In order to make such a memory device competitive with other memory technologies, devices that exhibit faster domainwall motion (due to in-plane current) when compared to the ferromagnetic device have also been explored [5]- [11]. The ferrimagnetic Co/Gd-bilayer device is one such device [9]. Exchange-coupling torque (ECT), the origin of which is similar to that of the spin-orbit torque (SOT), has been experimentally shown to drive a domain wall at a very high velocity, near the angular-momentum-compensation temperature, in such a Co/Gd-bilayer-based synthetic ferrimagnet device [9].
Recently, along with memory devices, ferromagnetic domain-wall motion has been proposed as the working principle for synaptic devices in analog-hardware implementation of neural networks as well [12]- [22]. Neural networks currently form the most popular way to carry out machine-learning tasks for large-scale data [23]. It has been argued that instead of implementing such a neural network on a conventional digital computer where the memory unit and the computing unit are physically separate, if such a network is implemented on a crossbar array of the aforementioned synaptic devices (such a crossbar implements in-memory computing), then the same data-classification task can be carried out in a faster and more energy-efficient way [24]- [29].
Unlike a traditional non-volatile memory cell, which stores only bit 0 or bit 1, a synapse cell can store any one of multiple weight values between 0 and 1 at a given time in a nonvolatile way. Electrical pulses can be used to change the weight stored in the synapse from one of those values to another. Thus a synapse is essentially a near-analog (multiple values of weight), non-volatile memory device.
In a ferromagnetic synapse device, each weight value corresponds to a different position of the domain wall in the device. To modulate the weight value, the domain wall is moved using current pulses [12], [13], [20]- [22]. If the ferromagnet-based synapse devices are replaced by their ferrimagnetic counterparts, then intuitively, faster domain-wall motion (in the ferrimagnetic system, compared to the ferromagnetic system) will enable the use of faster and lowerenergy-consuming current pulses for weight modulation and hence faster and more energy-efficient analog-crossbar-arraybased neural-network implementation.
In this paper, through a combination of device-level and system-level simulations, we establish this intuitive idea. Our paper makes the following major contributions: 1) Device-Level Study: We model the Co/Gd-bilayer-based synethetic-ferrimagnet device here using micromagnetics and demonstrate high-speed domain-wall motion near the angular-momentum compensation temperature. We benchmark our model against experimentally observed domain-wall motion in such a system, as reported in Blasing et al [9]. While one-dimensional-domainwall-theory-based modeling has been used to support the experimental data in Blasing et al. [9], to the best of our knowledge, this is the first micromagneticsbased study of the same device. We also simulate a ferromagneic device for comparison and show that for certain conditions, the domain-wall velocity of the ferrimagnetic device is 2-2.5 times higher (2-2.5 X) than the ferromagnetic device. We compare the domainwall velocity numbers (in both the ferrimagnetic device and the ferromagnetic device) which we obtain from our simulation with that reported in recent experiments [3], [9] to support this domain-wall-velocity ratio that we report here (2-2.5 X). This velocity ratio results in the system-level advantage of the ferrimagnetic synapse compared to the ferromagnetic synapse that we report below. 2) System-Level Study: We next calculate the speed and energy consumption for on-chip learning (training in hardware) on a crossbar array of ferrimagnetic synapses, which we have simulated at the device level using micromagnetics as mentioned above. After that, we calculate the speed and energy consumption for on-chip learning on a crossbar array of ferromagnetic synapses [12], [13], [20]- [22]. We show that when the time for training is the same for both the crossbar arrays, energy consumption is four times (4X) lower in the ferrimagnetbased crossbar when compared to the ferromagnet-based crossbar. Similarly, when the energy consumption for training is kept the same, time taken for training is four times (4X) lower in the ferrimagnet-based crossbar when compared to the ferromagnet-based crossbar. We show this for training on various popular machine-learning data sets: Fisher's Iris and MNIST [30], [31].
Secion II contains our device-level study and Section III contains our system-level study. Section IV concludes the paper. As mentioned in the previous section, we first compare the domain-wall velocity in the ferrimagnetic and the ferromagnetic device through micromagnetic simulations. For this purpose, we first show schematics of the two devices in Fig. 1 and briefly describe the operating physics in each case. Fig. 1(a) shows the schematic of the Co/Gd-bilayer-based ferrimagnetic device. Magnetic moments in the Co and the Gd layer are anti-parallely coupled through an exchangecoupling mechanism [9]. But since the magnitudes of the magnetic moments in the two layers are unequal for all values of temperature other than the magnetic compensation temperature, this is a ferrimagnet and not an anti-ferromagnet. In-plane current flowing through the underlying heavy metal layer (Pt) results in the injection of spin current into the layers above it, due to the phenomenon of spin Hall effect exhibited by the heavy metal. Hence, due to the exchange-coupling torque, the domain wall in the Co layer, and also in the Gd layer, moves in the same direction retaining the same relative orientation between the moments of the two layers as before [9]. The micromagnetic simulation we carry out here on the micromagnetic package, "mumax3", captures this physics as we show later in this section [32]. Fig. 1 (b) shows the schematic of the CoFe-layer-based ferromagnetic device. In-plane current flowing through the underlying heavy-metal layer (Pt) results in the injection of spin current into the ferromagnetic CoFe layer [1]- [3]. Hence, due to the spin-orbit torque, the domain wall moves, as captured in our micromagnetic simulation of this device on "mumax3" [20]- [22].
The lateral size of both the ferrimagnetic device and the ferromagnetic device in our simulation is: 600 nm × 50 nm. Thickness of the CoFe layer in the ferromagnetic device is 1 nm, while in the ferrimagneic device, thickness of the Co layer is 1 nm and thickness of the Gd layer is also 1 nm. For both the devices, Spin Hall angle (θ SH ) of the heavy metal (Pt) is considered to be 0.13 [9]. Thickness of the heavy metal (Pt) layer, below the ferrimagnetic/ ferromagnetic layer, is taken to be 10 nm, which is greater than the spin diffusion length in Pt [33], [34]. Hence, we can consider the vertical spin current density injected by the heavy metal layer into the ferrimagnetic/ferromagnetic layer above it (J s ) = in-plane charge current density (J c ) × θ SH [35]- [37]. For the ferromagnetic device, dynamics of the magnetic moments of the CoFe layer above the Pt layer is simulated using micromagnetic simulation package "mumax3" [32] under the influence of this spin current from the Pt layer. For the ferrimagnetic device ( Fig. 1(a)), dynamics of the magnetic moments of the Co layer, just above the Pt layer, is simulated  under the influence of the spin current from the Pt layer. We also simulate the dynamics of the moments of the Gd layer, which is just above the Co layer, alongside simulating that for the Co layer; the spin current from the Pt layer indirectly affects the moments in the Gd layer because the moments of the Gd layer are exchange-coupled with that of the Co layer [9].
We micromagnetically simulate the ferrimagnetic device at various temperatures, from 125 K to 175 K, since the domainwall velocity in such a ferrimagnetic device has been shown to widely vary with temperature in experiments [9]. Magnetic moment of Co (m Co ) and magnetic moment of Gd (m Gd ) vary with temperature (T ) following the equations below (Bloch's Law [38]): where T c Co is the Curie temperature of Co, T c Gd is the Curie temperature of Gd, ξ Co is the exponential parameter describing the temperature dependence for the Co layer, and ξ Gd is the exponential parameter describing the temperature dependence for the Gd layer [9], [38]- [41]. m T =0 Co is the magnetic moment of the Co layer at 0 K and m T =0 Gd is the magnetic moment of the Gd layer at 0 K. In our simulation, T c Co = 600 K, T c Gd = 520 K, ξ Co = -0.7, ξ Gd = 4, m T =0 Co = 0.7 mA, and m T =0 Gd = 3.6 mA, just as in Blasing et al. [9].
The strength of exchange coupling between the Co and Gd layers is taken to be 0.9 mJ/m 2 [9]. Such a value of exchange coupling is high enough to ensure that the moments of the two layers are always oriented anti-parallel and so the domain walls in the two layers have the same width and move with the same velocity [42], [43]. The exchange-coupling constant within the Gd layer is considered to be 0.6×10 −11 J/m and that within the Co layer is also considered to be 0.6 × 10 −11 J/m. We also include the presence of an anti-symmetric exchange due to the Dzyaloshinskii-Moriya interaction (DMI) at Pt/Co interface using a DMI constant of 0.16 mJ/m 2 [9]. But we do not consider any DMI for the Co/ Gd interface, above the Pt/Co interface ( Fig. 1(a)), since no DMI has been reported for the Co/Gd interface thus far. The damping constant is considered to be 0.1 for both the layers. While most of our simulation parameters for the ferrimagnetic device are the same as that used in Blasing et al. [9], as mentioned earlier, the device modeling in [9] is based on one-dimensionaldomain-wall theory while our device modeling is based on micromagnetics (on "mumax3").
The ferromagnetic device is simulated here at room temperature only since the ferromagnetic-domain-wall velocity is not known to vary significantly with temperature [44], [45]. We use the following simulation parameters on "mumax3" for the ferromagnetic device: saturation magnetization = 7 × 10 5 A/m, Perpendicular Magnetic Anisotropy (PMA) constant = 8 × 10 5 J/m 3 , exchange correlation constant = 1 × 10 −11 J/m, and damping factor = 0.3 . We also assume an interfacial Dzyalonshinskii Moriya Interaction (DMI) strength of 1.2 mJ/m 2 due to which the domain wall acquires Neel-type chirality. These simulation parameters have also been used in the experimentally benchmarked micromagnetic study of the heavy-metal/ferromagnet-bilayer device we consider here [21], [22], [46]. Fig. 2 shows evolution of the magnetic moments in the Co layer and Gd layer under the application of in-plane charge current through the heavy-metal (Pt) layer. Each image contains two panels representing the map of the magnetization state in Gd (upper panel) and Co (lower panel) layer at a particular instant. Going by the color maps in Figure 2, the red color represents magnetic moments in +z direction and the blue color represents magnetic moments in -z direction. Fig.  2 (a) and (d) corresponds to the initial magnetization state at t = 0 where the domain wall is situated at the center of the device, both in the Gd layer and the Co layer. Upon the application of current density in +x direction, the domain wall attains a positive velocity and moves in the +x direction ( Fig.  2 (b), (c)), while for a negative polarity of current, the domain wall moves in the −x direction (Fig. 2 (e), (f)).
Based on the evolution of magnetic moments over time, as simulated through micromagnetics and shown in Fig. 2, we calculate the domain-wall velocity in the ferrimagnetic device. Since we carry out our micromagnetic simulations at various temperatures using the variation of magnetic moments of Gd and Co with temperature (given by Eq. (1)), we obtain the domain-wall velocity for different current densities and different temperature values. In Fig. 3, we show the variation of domain-wall velocity in the ferrimagnet as a Fig. 6. Schematic of the crossbar-array-based fully connected neural network (FCNN), with one hidden layer, simulated here. At each junction of the horizontal bars and vertical bars, a synaptic device is present (ferromagnetic/ ferrimagnetic). Analog peripheral circuitry is used to compute the changes in the synaptic weights needed for training the network. The purpose of the three separate crossbars and other details are provided in the text here and in [22]. function of applied charge-current density for different values of temperature. At each temperature, we recalculate the values of magnetic moments for Co and Gd using Eq. (1) and use the values in our micromagnetic simulation of the ferrimagnetic device on "mumax3". Thus, each temperature value corresponds to a different value of the ratio of magnetic moment of the Gd layer to that of the Co layer (m Gd /m Co ).
As shown in Fig. 3, for all such temperature values, the domain-wall velocity increases with the increase in current density and then shows saturation, in some cases, at higher current-density values. Within the range of current denity used in our simulations and shown in Fig. 3, the highest domainwall velocity is achieved at T = 165 K. At this value of temperature, the magnetic moments of Gd and Co are related as: m Gd = 0.9 m Co while the angular momentum is fully compensated (T A ). This can be more clearly seen in Fig. 4 where we directly plot the velocity of the domain wall as a function of the ratio of magnetic moments of Gd and Co (m Gd /m Co ) for various values of current density. For each case, the maximum velocity is observed at m Gd /m Co = 0.9, which corresponds to the aforementioned angularmomentum -compensation temperature (T A ). This matches with the experimental finding and the corresponding one dimensional-domain-wall-theory-based calculation in Blasing et al [9] because in Blasing et al [9], it has also been reported that the ferrimagnetic domain wall attains maximum velocity at the angular-momentum-compensation temperature. Such high velocity of domain wall as we report here near the angular-momentum-compensation temperature ( ≈ 650 m/s) has also been reported in Blasing et al for a similar value of current density (≈ 3 × 10 12 A/m 2 ). Thus the micromagnetic model of the ferrimagnetic device we develop here has been benchmarked against recent experiments. . Thus our micromagnetic model for the ferromagnetic device is also experimentally benchmarked. For comparison, in Fig. 5 (a), along with the velocity of the ferromagnetic domain wall, we also plot the velocity of the ferrimagnetic domain wall at the angularmomentum-compensation temperature (T = 165 K) -the temperature at which the ferrimagnet's domain-wall velocity is the highest. Fig. 5 (b) shows the ratio of the velocity of the ferrimagnetic domain wall to that of the ferromagnetic domain wall as a function of current density, as obtained from Fig. 5 (a). We observe that across a wide range of current densities, the domain-wall velocity in the ferrimagnetic device at the angular-momentum-compensation temperature (165 K) is 2-2.5 times higher than that of the domain wall in the ferromagnetic device at room temperature. Since the ferromagnetic and ferrimagnetic domain-wall velocity values obtained from our models match with recent experiments (as explained before), this ratio of domain-wall velocity in a ferrimagnetic device to that in a ferromagnetic device, which we report here, is also consistent with recent experiments [3], [9].  [20], [22], [23].
Crossbar III is used to compute the change in weight values of the synapses in crossbar I, needed for training the network, by multiplying the common part of weight update (∆V 1 , ∆V 2 , ... ∆V N ) with the synaptic weights of crossbar II (v 1,1 , v 2,1 .... ..v N,1 ,v 1,2 , v 2,2 ......v N,2 ........ v 1,N , w 2,N ......v N,P ). This is the back-propagation algorithm used for training muli-layer neural networks which is implemented here through crossbar III and peripheral analog circuits as shown in Fig. 6. More details on this implementation can be found in in [22].
The learning algorithm we have used here to train our system is the same thresholding algorithm that has been used to train a two-layer FCNN in [22]. Following this algorithm, at any given iteration, the weight of each synapse only changes by one discrete step or quantum. Thus, if each weight is thought of a binary string with N bits, at each iteration, only one of the three things can happen: 1) The weight increases by one step or quantum. This means that if the weight is of 2 bits, '00' can become '01', '01' can become '10', and '10' can become '11'. For our simulated systems, both ferromagnet-based and Fig. 7. Comparison of energy consumed (in the synapse) per synaptic-weightupdate pulse vs time duration of such a pulse for the following two cases: ferrimagnetic synapses (at the angular-momentum-compensation temperature, when its domain-wall velocity is the highest) and ferromagnetic synapses (at room temperature).
ferrimagnet-based, each synapse stores a weight value with the resolution of ≈6 bits (total number of conductance states per synapse = 50). For an increase in weight, the domain wall moves by 10 nm in +x direction due to a current pulse of positive polarity (as shown in Fig. 2 (b) and (c)). The total length of the device is 600 nm, as mentioned earlier. 50 conductance states, with one state changing to the next due to 10-nm motion of the domain wall, imply a distance of 500 nm traversed by the domain wall. Extra 50 nm is allotted at the two edges of the device so that the domain wall is not destroyed at the edges. 2) The weight decreases by one step or quantum. This means that if the weight is of 2 bits, '11' can become '10', '10' can become '01', and '01' can become '00'. This means that for both our ferromagnet device and ferrimagnet device, for such a decrease in weight, the domain wall moves by 10 nm in -x direction due to a current pulse of negative polarity (as shown in Fig. 2 (e) and (f)).
3) The weight remains unchanged. In this case, no current pulse is applied and the domain wall doesn't move.
The same thresholding algorithm, as described above and elucidated in details in [22], is used here to train the crossbar arrays of ferromagnetic synapses and the crossbar arrays of ferrimagnetic synapses. Since both the ferromagnetic synapses and the ferrimagnetic synapses have the same number of conductance states (50), the bit resolution for each synapse in the corresponding FCNN is the same in either case. So the final training and test accuracy numbers are the same for the ferromagnet-synapse-based crossbar and the ferrimagnetsynapse-based crossbar for any particular data set. We use two popular machine-learning data sets, Fisher's Iris and MNIST, for our purpose [30], [31].
For Fisher's Iris data set, from a data base of 150 samples, we choose 100 samples for training and 50 samples (different Fig. 8. Total energy consumed in all the synapses per epoch of the training process as a function of the number of epochs, both in a crossbar array of ferrimagnetic synapses (blue plots) and in a crossbar array of ferromagnetic synapses (red plots), for the following cases (a) Fisher's Iris data set used for training, each weight-update pulse is 100 ps long (b) Fisher's Iris data set used, each weight-update pulse is 300 ps long (c) MNIST data set used, each weight-update pulse is 100 ps long (d) MNIST data set used, each weight-update pulse is 300 ps long.

Data Set
Spintronics Devices Train Accuracy(%) Test Accuracy(%) Duration of the pulse (ps) Total time (µs) Total energy (J) 1000 images (different from the training images) for testing. Each image is a 28-by-28 pixel image and can be classified as one of the ten categories, corresponding to the ten digits: 0-9. So we use a 784 x 100 x 10 FCNN for this purpose. After 100 epochs, our trained FCNN yields a training accuracy of 97.98% and testing accuracy of 92.4% (Table I, Table II) for the MNIST data set.
Thus, the same thresholding algorithm is applied on the ferromagnet-synapse-based network and the ferrimagnetsynapse-based network; the same number of pulses is needed to update the weights of all the synapses in the two networks and achieve training; and the same accuracy numbers are obtained (Table I, Table II). However, the domain-wall velocity is 2-2.5 times higher in the ferrimagnetic device compared to that in the ferromagnetic device for the same value of current density (Fig. 5). So, when we decide to update the synaptic weights of both the ferromagnetic and ferrimagnetic crossbars with current pulses of the same duration (Fig. 5), the current pulses needed to increase/ decrease the synaptic weight by one step/ quantum (this involves moving the domain wall by a fixed distance of 10 nm as mentioned before) are higher in magnitude for the ferromagnet crossbars compared to the ferrimagnet crossbar. This leads to higher energy consumption per pulse for the ferromagnetic synapse compared to the ferrimagnetic synapse, for the same pulse duration.
The assumption here is that the resistance of the path through which the in-plane current flows is the same for both synapses. This assumption is valid because in the ferrimagnetic device, resistivity value of the Gd/Co layer (≈ 250-350 µΩ cm [48], [49]) is much higher than that of the Pt layer (≈ 10 µΩ cm [50]). And in the ferromagnetic device, resistivity of CoFe (≈ 170 µΩ cm [35]) is much higher than that of Pt. Also the thickness of the Gd (1 nm), Co (1 nm), or CoFe layer (1 nm) is one order smaller than that of the Pt layer (10 nm). So in either device, resistance of the Gd/Co layer or the CoFe layer is much higher than that of the Pt layer. Hence, the current can be rightly assumed to flow through the much-more-conducting Pt layer. Since the thickness of the Pt layer is the same in both the ferromagnetic and ferrimagnetic device and the length and the width of both devices are the same, the resistance of the path through which the in-plane current flows can indeed be considered the same for both synapses. Energy dissipated in the synapse is the product of the square of the current, pulse duration, and resistance of the device. Between the ferrimagnetic and ferromagnetic synapses, with the resistance the same and pulse duration also the same, higher current density means higher energy (area of cross-section is the same, so higher current density means proportionately higher current). Fig. 7 shows by how much the energy consumed per synapse for a pulse on the ferromagnetic synapse is higher than that for a pulse on the ferrimagnetic synapse for the same value of synaptic-weight update (increase/ decrease by one quantum) and same duration of the pulse. We show this for different values of pulse duration.
Next, in Fig.  8, we plot and compare the total energy consumption (in the synapses) for training the ferromagnetic-synapse-based and ferrimagnet-synapse-based multi-layer FCNN through the synaptic-weight updates mentioned above, keeping the duration of each weight-update pulse constant. Both for Fisher's Iris data set and MNIST data set, total energy per epoch is about 4X higher for the ferromagnetsynapse-based crossbar compared to the ferrimagnet-synapsebased crossbar, whether the duration of each pulse is 100 ps or 300 ps. The total energy for training at the end of 100 epochs (when training has been achieved) has been listed in Table I for all these cases. The 4X energy increase between training the ferromagnetic-synapse crossbar and the ferrimagneticsynapse crossbar has been observed here as well. Total time for learning remains the same both for the ferromagnetic-synapse crossbar and the ferrimagnetic-synapse crossbar for the same data set when the duration of each weight-update pulse is kept unchanged.
Instead of keeping the duration of the weight-update pulse fixed, if we keep the energy dissipated in the synapse per pulse fixed between the ferromagnet-synapse-based crossbar and the ferrimagnet-synapse-based crossbar, then much longer pulses are needed for the ferromagnet crossbar compared to the ferrimagnet crossbar again due to the much lower velocity of the domain wall in the ferromagnetic synapse compared to the ferrimagnetic synapse for the same value of current density (Fig. 5). Table II shows how the total time for learning is hence 4X higher for the ferromagnet crossbar compared to the ferrimagnet crossbar for the same data set when the energy per weight-update pulse is kept constant (total energy for learning is the same for the ferromagnet crossbar and the ferrimagnet crossbar in this case). Thus, through Fig. 8, Table I, and Table  II, we have shown that a ferrimagnet-synapse-based crossbar offers 4X faster (for the same energy efficiency) or 4X more energy-efficient (for the same speed) learning when compared to the ferromagnet-synapse-based crossbar.

IV. CONCLUSION
In summary, here we have modeled domain-wall motion in ferrimagnetic and ferromagnetic devices through micromagnetics and shown that the domain-wall velocity can be 2-2.5X faster in the ferrimagnetic device compared to the ferromagnetic device. We also show that this velocity ratio is consistent with recent experimental findings [3], [9]. Because of such a velocity ratio, when such devices are used as synapses in the crossbar-array-based fully connected network, our system-level simulation here shows that a ferrimagnetsynapse-based crossbar offers 4X faster (for the same energy efficiency) or 4X more energy-efficient (for the same speed) learning when compared to the ferromagnet-synapse-based crossbar. Thus a ferrimagnetic synapse device, as modeled here, can pave the way for faster and more energy-efficient on-chip learning of analog-hardware neural networks.