Towards the non-zero field cesium magnetic sensor array for magnetoencephalography

Magnetic sensors developed for application in magnetoencephalography must meet a number of requirements; the main ones are compactness, sensitivity and response speed. We present a quantum optically pumped atomic sensor with cell volume of 0.5 cm that meets these requirements and is operable in nonzero magnetic fields. The ultimate sensitivity of the sensor was estimated as (using the criteria of the ratio of the slope of the magnetic resonance signal to the shot noise spectral density) to be better than 5 fT/ √ Hz. The actual sensitivity, measured in a gradiometric scheme, reaches 13 fT/ √ Hz per sensor. We also present a novel and fast algorithm for optimization of the geometric properties of non-zero field sensor array with respect to maximization of the information transfer rate for cortical sources.


I. INTRODUCTION
M AGNETOENCEPHALOGRAPHY (MEG) is a rapidly developing non-invasive technology for recording the brain activity, characterized by a high temporal resolution and allowing the study of fast neuronal processes without violating the integrity of tissues. Traditional MEG sensors register magnetic fields of the order of fT created by currents propagating along the dendrites of neurons in the cerebral cortex. MEG methods uniquely combine high temporal (less than 5 ms) resolution with acceptable accuracy of spatial localization of cortical activity (less than 10 mm). The magnetic activity of the brain has been detected for decades by using quantum superconducting interference devices (SQUID) in MEG complexes assembled into an array of sensors in order to register a magnetic field map in real time [1], [2] and reconstruct cortical activity [1], [3].
The best low-noise SQUID magnetometers have sensitivity of less than 1 fT/ Hz [4], [5], well suited for magnetoencephalography. But the disadvantages of such MEG systems are their stationarity and high cost. Furthermore, SQUIDs require continuous cooling with liquid helium, which also increases operating costs. Since the helmet containing the array of sensors is a Dewar vessel, its shape is fixed and sized to fit the largest individual head. The impossibility of adapting the array of sensors to the anatomical features of each subject leads to a significant loss of in signal-to-noise ratio in the measured signals and loss of spatial resolution. An additional increase of the source-sensor distance is due to the minimal required thickness of the Dewar vessel wall.
An alternative to SQUID is optically pumped magnetometer (OPM), based on optically detectable magnetic resonance (ODMR) in alkali metal atoms (K, Rb or Cs) in the gas phase. A record sensitivity among OPMs at the level of 0.16 fT/ Hz is demonstrated by magnetometers operating in a spin-exchange relaxation-free (SERF) mode [6], [7]. Currently, multichannel MEG systems comprising SERF sensors are being tested in the labs around the world [8]- [10].
One of the advantages of such devices is the small size of the sensor: at a sufficiently high temperature that provides the required density of atomic vapors, the sensitive element of the magnetometer (the vapor cell) can have a volume of less than 1 cm 3 , which allows the sensors to be placed in close proximity to the subject; this increases the signal strength and improves the spatial resolution [11]. The disadvantages of SERF magnetometers as applied to MEG problems are the requirement to provide a zero (i.e., not exceeding several tens of nanotesla) uniform external magnetic field, as well as a strong mutual influence of nearby sensors due to the need for low-frequency modulation of the magnetic field in the SERF scheme.
The aforementioned problems of MEG systems implementation put impetus on the development of highly sensitive magnetometers that would not require zero magnetic fields or ultra-low temperatures [12]- [14]. These properties are characteristic of "classical" OPMs (i.e. OPMs that use RF-field driven magnetic resonance), which demonstrated a sensitivity of 2 fT/ Hz in a laser pumped scheme almost 25 years ago [15]. But this sensitivity was achieved, firstly, due to the large size of the cell (which was a sphere with a diameter of 15 cm), and, secondly, due to the application of a paraffin antirelaxation coating on its inner surface. A decrease in cell size results in sensitivity lost proportional to the square root of the volume [16]. An increase in the concentration of atoms leads to an increase in the amplitude of the ODMR signal, as well as in the rate of spin-exchange relaxation, which determines the line-width and, ultimately, sensitivity; this has long hampered the development of ultra-sensitive compact OPM.
Subsequently, methods were found to minimize the spinexchange broadening; for example, the aforementioned SERF magnetometers use the effect of almost complete suppression of spin-exchange broadening in zero magnetic field discovered by W.Happer [17], [18]. Back in 1999, Happer's group discovered another effect -a partial suppression of the spin-exchange relaxation rate at high pumping intensities in non-zero field [19]. At high enough optical pumping rate a significant fraction of the atoms are collected in a so-called "stretched state" and collisions do not change their spin states. This effect was used in double-beam M X -magnetometer with high pressure buffer gas cell [20]. The sensitivity about 10 fT/ Hz in a volume of about 1.5 cm 3 was demonstrated. Later, a magnetometer based on a single-beam M X scheme was presented and a sensitivity of 42 fT/ Hz was achieved in a 9.3 mm 3 buffer gas cell [21]. Then, the sensitivity of order 1 fT/ Hz was demonstrated in double-beam radio-frequency magnetometer with 10 cm 3 volume paraffin coated cell [22].

II. NON-ZERO FIELD SENSOR FOR MEG
As a basis for our nonzero field sensor, we used a slightly modified two-beam M X scheme [23]; our experimental setup contained two such sensors and operated in a gradiometric configuration. The experimental setup is shown in Figure 1. Compact gas cells with Cs vapors are located in the center of a non-magnetic thermostat containing non-magnetic heaters and a thermal sensor. To suppress relaxation on the cell walls, the cells were filled with buffer gas (nitrogen at ≈100 Torr), which slowed down the diffusion of atoms to the walls and effectively quenched the luminescence. The thermostat was located in a cylindrical multilayer shield, the magnetic field in which was maintained at the level of 12 µT by a longitudinal solenoid with a system of compensating magnetic coils. The cells were located right next to each other -the distance between their near edges did not exceed 1 mm.
Optical pumping was carried out by circularly polarized beam along the magnetic field. Pumping laser was stabilized to the hyperfine transitions of the cesium D1 line. Signal detection was carried out by the transverse linearly polarized beam, frequency of which is detuned from the center of the Cs D 1 absorption line by ≈30 GHz downward, so detection was achieved in a quantum non-destructive (QND) manner. The magnetic resonance signal in each cell was driven by a transverse resonant AC RF field and detected by a balanced photodetector, consisting of a polarizing beamsplitter cube and two photodiodes. The angle of the beam splitter axis to the polarization azimuth of the incoming light in the absence of resonance was 45 degrees; the differential photocurrent in such a scheme is (as a first approximation) proportional to the angle of polarization rotation [24]. The photodetector signals were demodulated by lock-in-amplifiers Stanford SR830.
In the gradiometric configuration, the resonances in two cells were driven by common RF coils, and the signals detected by two balanced photodetectors were subtracted from each other. In this case, the signal from one cell was also used for the magnetic field stabilization. The field in the second cell was modulated with an amplitude of 10 pT r.m.s. and a frequency of 10 Hz in order to calibrate the sensitivity of the gradiometer. Special efforts were made to equalize the parameters of the pump beams, probes beam, and resonant RF fields in both cells.
The scheme used in this experiment is characterized by the following additional advantages. Tuning the circularly polarized pumping beam into resonance with the F = 3 ↔ F = 3, 4 transition (hereinafter, prime indexes refer to the optical excited state) makes it possible to increase the signal several times compared to the widely used F = 4 ↔ F = 3, 4 pumping scheme. This effect, first discovered in [21], was studied in detail for buffer gas cell in [25], and for a coated cell -in [26]. The operation of the magnetometer in M X mode provides a high response speed unattainable in the M Z scheme, and the use of two-beam scheme makes it possible to further increase the sensitivity as compared to [21]. The transfer of the signal frequency to tens of kilohertz allows, in comparison with SERFs, to significantly reduce the effect of low-frequency flicker noise of lasers and magnetic environment. The balanced detection scheme is insensitive to the amplitude noise or the probe laser. And, finally, the use of intensive laser pumping makes it possible to implement the "stretched state" [19] and thereby further increase the sensitivity. Signals in two pairs of cubic cells with face lengths of 5 and 8 mm were studied over a wide range of temperatures and pumping parameters. We have defined the conditions under which the maximum sensitivity is achieved. Ultimate sensitivity is defined as the ratio of the magnetic resonance slope to the spectral amplitude of light shot noise [16]. It turned out that for the 5 mm cells, the ultimate achievable sensitivity is about 9 fT/ Hz, whereas for the 8 mm cells it amounts to about 5 fT/ Hz. At the same time, the real level of low-frequency noise of the magnetic field in the shield we used was about three orders of magnitude higher than this value; therefore, the direct demonstration of the sensor's sensitivity required a number of additional efforts, in particular, the implementation of a gradiometer configuration based on two independent sensors [20].
It is also necessary to take into account that the quantum projection noise in optical sensors that use QND detection, can be comparable to the light shot noise. Quantum projection noise manifests itself under strong optical pumping in an increase in the spectral noise density at the center of the magnetic resonance line. As shown in [27], in "stretched state" the spectral amplitude of the projection noise increases (at its constant power) due to the narrowing of the magnetic resonance line. This circumstance significantly reduces the positive effect of the "stretched state".
The gradient noise spectra are shown in Figure 2 together with the calibration signal. Acoustic noise is clearly dominant at low frequencies; this is due to the fact that the optical scheme and the magnetic shield were mounted on separate tables, the length of the optical path in each sensor was about three meters, and the light beams were exposed to air convection and vibration. In the future we expect the use of single-mode polarization-maintaining fibers to largely eliminate this problem.
The intrinsic (measured in an acoustic noise-free frequency range 70 -100 Hz) noise level of a gradiometer based on 5 mm cells is about 30 fT/ Hz, while for a gradiometer with 8 mm cells it is about 18 fT/ Hz. The sensitivity of each sensor in this frequency domain can be estimated by dividing these values by 2, and amounts approximately to 21 fT/ Hz for the 5 mm cell sensor, and 13 fT/ Hz for the 8 mm cell sensor. The ratio of the sensitivities, as expected, is roughly equal to the root of the cell volumes ratio, which indirectly indicates the principal role of noise as the limiting factor affecting the sensitivity [16].

III. DESIGNING OPTIMAL PROBE
In order to maximize the information transfer rate between the activity of neural sources and the signals registered by the OPM sensors here we propose a simple and efficient method to derive probe configuration based on the well-accepted in MEG community electromagnetic forward modelling. In doing so we will maximize the information transfer rate between the vector of source activity and the vector of signals measured by our sensors.
The underlying geometry of this problem is illustrated in Figure 3 where the cerebral cortex enclosed into the head that serves as volume conductor is shown. It is implied that our OPMs are to be located on the scalp. The fact that the OPM proposed here is sensitive to the total magnetic field complicates solving the MEG inverse problem and makes the methods based on the linear observation model inapplicable. Therefore it is desirable to operate the OPMs in the vector magnitometer mode which can be accomplished by applying large external magnetic field. In this case the sensitivity axis of our OPM will coincide with the direction of this field at the OPM's location. To avoid the interaction between the neighbouring sensors a possible solution is to perform the measurements under the external field created by a large external coil. In this case, sensitive axes of all the magnetometers located on the scalp will coincide, for example, with the global vertical axis as shown with red arrows in Figure 3.
Suppose we have M magnetometers at our disposal and we want to place them on the scalp forming an optimal probe that maximizes the amount of mutual information between the vector of activity of neuronal sources s(t) = [s 1 (t), . . In order to derive the expression for the amount of information present in the probe signals about the activity of the underlying neuronal sources we start with the observation equation linking neuronal activity vector s(t) at time t to our measurements vector x(t) registered by the array of M sensors: Fig. 3. The meshes that approximate the scalp and the surface of the cerebral cortex. At each node of the shell we also show a vertically oriented vector coinciding with the direction of the externally applied magnetic field that defines sensitivity axis of our magnetometer when operating in such a field.
where G = g T 1 ; . . . ; g T N is a forward model matrix whose rows g i are the sensor lead fields. For compactness in the presentation that follows we will stick to the fixed orientation source model, however the ideas presented can be easily extended to handle neuronal sources with arbitrary orientation. The noise term n(t) represents the inevitable measurement noise.
Provided that the additive noise n i (t) and the neuronal activity signal x i (t) = g T i s(t) in the i-th sensor measurements channel are independent and normally distributed, it can be shown [11] that the amount of information I i in one signal sample (information rate) of the i-th channel can be expressed through the signal-to-noise ratio γ i as I i = 0.5log(γ i + 1). For simplicity, but without loss of generality, we first assume that the measurement noise covariance matrix C = E{n(t)n(t) T } = I, which can always be achieved by a data whitening procedure performed by multiplying the data vector by the inverse square root C −1/2 of the noise covariance matrix. Assuming that the activity of neuronal sources s(t) on the entire cerebral cortex is characterized by a unit covariance matrix R = E{s(t)s T (t)} = I, then the signal-to-noise ratio in the i-th sensor is expressed simply as the square of the norm of the i-th row g T i of the forward model matrix G, In case sensor and source space covariance matrices can not be modelled as identity matrices, i.e. C = I and R = I we introduce matrix Γ = C −1/2 GR 1/2 . Then, the vector of channel SNR values is expressed as γ = diag(ΓΓ T ) = diag(C −1/2 GRG T C −1/2 ).
To calculate the total amount of information contained in one time sample of signals from all sensors in the array, we would need to sum the amount of information over all sensors. However, since lead fields of the sensors are not orthogonal, i.e. g i = g j for i = j and so are the rows of Γ, a simple summation of information content over all sensors will lead to an overestimation of the amount of information in the probe measurements.
In order to understand how to compute the amount of information in this case let us orthogonalize the rows of Γ by multiplying this matrix on the left with the eigenvectors of matrix A = ΓΓ T , i.e.Γ = E T Γ where E is the matrix whose columns are the eigenvectors of A.
Therefore, the vector of SNR values in the orthogonalized channels is given as γ = diag(E T ΓΓ T E) = λ, where λ is simply the vector of eigenvalues of A = ΓΓ T . Therefore, the total amount of information contained in a single sample of the data measured by our array of sensors is given simply as where λ i (A), i = 1, ..., M denotes the i-th eigenvalue of A. Note, that under the assumption of normality of s(t) and n(t) this is the most general form of the expression for the amount of information about the activity of cortical sources in the MEG sensor array data that allows us to account for arbitrary correlation structure of the neuronal activity s(t) and that of the additive sensor noise n(t). Modern QuSpin OPM sensors can measure magnetic field along the three mutually orthogonal axes and such vector measurements can also be potentially incorporated in the approach proposed here. Properly specified measurement noise covariance C can accommodate different sensitivity of the measurements for the three orthogonal axes. The fundamental mechanisms of brain functioning, its inherent symmetry and the presence of so-called default mode networks result into a great deal of correlation in the activity of cortical sources that can be statistically modelled with source-space covariance matrix R.
For convenience, we will limit the scalp area where the sensors can be potentially located to the upper part of the head surface, see Figure 3. We will also assume that the sensors can only be placed at the nodes of the scalp mesh and that the sensors' sensitive axis is aligned with the local normal shown with the red arrow. There are K = 546 such nodes in the head model we used here. We have at our disposal M < K sensors and we need to distribute them in the optimal way over K = 546 positions, so that the resulting sensor array would maximize the amount of information I contained in one time slice x(t). In what follows we will only consider the measurements of the radial field component oriented perpendicularly to the scalp surface (red arrow in  The discrete formulation presented above reduces to an NPcomplete problem of choosing M from a total number of K nodes of the grid of sensor positions to maximize the amount of information (1). Suppose M = 80 and K = 546, in this case the number of possible placement options is on the order of 10 97 which rules out a brute-force approach to this problem.
To solve this optimization problem we perform a greedy search in combination with the orthogonalization of the rows of the forward matrix G with respect to the lead field vectors of the already positioned sensors. As the first sensor in our array we simply take the sensor with index i * corresponding to the largest element of diag(A). Then, the next step is to remove from the forward model matrix G the "coverage" of the source space already furnished by this first sensor located at the scalp mesh vertex with index i * . This can be accomplished using projection of all rows of G away from the lead-field of the i * -th sensor with projection operator P 1 = I − 1 ||g i * || 2 g i * g T i * applied to the rows of the forward operator G as G 1 = (P 1 G T ) T = GP 1 . Then, using the projected forward matrix G 1 we compute A 1 = C −1/2 G 1 RG 1T C −1/2 and again choose the sensor with the largest diagonal element in the modified SNR matrix A 1 . We repeat this procedure until the desired number of sensors is not positioned. Note that on the k + 1st iteration when we have already selected positions for k sensors, in order to obtain G k the lead-field projection procedure is applied to the original matrix G and is based on the projection matrix P k computed using the original lead fields corresponding to k locations on the scalp computed so far. In other words on the k + 1st iteration the projection matrix is computed as P k+1 = I − [g i∈Ω k ][g i∈Ω k ] † where i ∈ Ω k stands for the set of indices of all k locations selected during the first k iterations, † denotes matrix pseudoinverse and in brackets we have matrices comprising k lead fields of the positioned sensors.
We applied the developed method to the anatomical information taken from one of the examples included in the standard distribution set of Brainstorm software [28], a widely used Matlab-based package for processing MEG and EEG data. The shells approximating the scalp and cerebral cortex are shown in Figure 3.A For simplicity we assumed sourcespace and measurement noise covariance to be identity matrices. Having placed the dipole sources at N = 15002 grid nodes of the cerebral cortex (gray shell) and using the Sarvas expressions [29] for the magnetic filed outside the spherically symmetric conductor approximating the head we first constructed the forward model matrix G with each row corresponding to one of K = 546 locations on the scalp surface where a sensor can be potentially placed. The result of distributing M = 80 sensors over the scalp surface is shown in Figure 4 We have then conducted local optimality check by perturbing a finite number of locations around the found optimum. The results of such a study are shown in Fig. 3. We explored three different probes with 20, 40 and 80 sensors. The presented descending curves correspond to the values of the sorted in the descending amount of information present in the probes in which the indices of all but two sensors correspond to those found by our algorithm, and the indices of the other two sensors are taken from all possible K = 546 locations. The values presented are normalized to the amount of information present in the optimal probe. All the curve values do not exceed 1.0 which means that all locally perturbed probes contain less information than the optimal probe of the corresponding size.
We have also performed 100000 Monte-Carlo iterations where we placed sensors in M = 80 randomly selected positions out of possible K = 546 locations and computed the amount of information in thus obtained sensor array. As can be seen from the graph in Figure 6, the amount of information in the sensor array designed using the proposed algorithm is significantly larger than that in the probe comprising randomly selected sensor positions. Assessment of the local optimality of the found solution. The three falling curves corresponds to the values of the amount of information I normalized to that contained in the optimal probe sorted in the descending order for the array of 20, 40 and 80 sensors. For the perturbed probe the locations of all but two sensors correspond to those found by our algorithm, but the locations of the other two sensors are randomly chosen from all possible pairs of K = 546 locations. Fig. 6. The result of a Monte Carlo study comparing the amount of information in the signals measured by radially oriented sensors located at M = 80 randomly selected vertices (blue solid) with the amount of information present in the data measured with our quasi-optimal probe (red-brown dotted line).

CONCLUSION
We investigated the possibility of creating a non-zero magnetic field sensor based on a two-beam MX scheme with intense laser pumping. We have shown that this scheme meets sensitivity requirements for application in magnetoencephalography. The intrinsic sensitivity reaches approximately 13 fT/ Hz when using a 0.5 cm 3 vapor cell as a sensing element. Acoustic noise prevailing at low frequencies can be suppressed by eliminating long (in our case, about three meters for each sensor) free space optical paths. The described nonzero field sensor differs from the SQUID and SERF sensors: it measures not the projection of the magnetic field vector, but its modulus. But if a weak magnetic signal appears as a complement to a strong constant field, the sensor turns out to be sensitive mainly to the projection of the signal along the main field and becomes a vector magnetometer. Additionally, we have described a simple algorithm to design a probe from a finite set of OPM sensors. For the sake of simplicity our presentation here has been restricted to the case of vertically oriented sensors. Given the general expression for the amount of information in the probe additional assumptions can be easily introduced by choosing the appropriate covariance matrices R and C reflecting the statistical properties of neuronal activity spatial distribution and measurement noise correspondingly. The proposed probe optimization procedure may help to design custom OPM arrays with reduced cost and optimal configuration in the context of a specific application.