Accurate Prediction of Electric Fields of Nanoparticles With Deep Learning Methods

Three different deep learning models were designed in this paper, to predict the electric fields of single nanoparticles, dimers, and nanoparticle arrays. For single nanoparticles, the prediction error was 4.4%. For dimers with strong couplings, a sample self-normalization method was proposed, and the error was reduced by an order of magnitude compared with traditional methods. For nanoparticle arrays, the error was reduced from 28.8% to 5.6% compared with previous work. Numerical tests proved the validity of the proposed deep learning models, which have potential applications in the design of nanostructures.


Mengmeng Li , Senior Member, IEEE, and Zixuan Ma
Abstract-Three different deep learning models were designed in this paper, to predict the electric fields of single nanoparticles, dimers, and nanoparticle arrays. For single nanoparticles, the prediction error was 4.4%. For dimers with strong couplings, a sample self-normalization method was proposed, and the error was reduced by an order of magnitude compared with traditional methods. For nanoparticle arrays, the error was reduced from 28.8% to 5.6% compared with previous work. Numerical tests proved the validity of the proposed deep learning models, which have potential applications in the design of nanostructures.

I. INTRODUCTION
S URFACE-enhanced Raman scattering (SERS) [1], [2] is a type of spectrum-detection technology that has been widely used in numerous fields. In recent years, it has been applied to single-molecule detection [2], spectrum analysis [3], biosensors [4], surface science [5], and cancer therapy [6], with the advantages of real-time, in-situ detection, and high sensitivity. Plasmons excited by electrons on the surface of a nanometal can significantly enhance the electromagnetic (EM) fields, especially for SERS from dimers. The SERS intensity has a strong relationship with the frequency of the incident EM waves, the structure of the nanometal surface, and the material properties. It is particularly important to determine the relationship between the structure and near-field intensity. Currently, the main numerical analysis methods are the finitedifference time-domain method (FDTD) [7], discrete-dipole approximation (DDA) [8], finite element method (FEM) [9], method of moments (MoM) [10], and fast integral methods (e. g. wideband nested equivalent source approximation (WNESA) [11]). These methods can calculate the near-fields of structures with arbitrary shapes accurately while incurring significant time costs.
Deep neural networks (DNNs) have been widely used in various fields such as disease diagnosis [12], image recognition [13], and strategy formulation [14]. Recently, DNNs have been widely used to solve EM and optical problems, replacing numerical simulation methods [15]. The DNN is also employed to achieve a simultaneous inverse design of the materials and structural parameters of core-shell nanoparticles [25]. The transmission spectra and structural parameters of complex plasma nanostructures were successfully predicted using a DNN. To avoid the nonunique response-to-design problem [26], forward and inverse DNNs were connected in series. The DNN is employed to achieve forward and backward predictions of the far-field optical properties of a variety of nanostructures [26]. A much denser sampling of the near fields was employed to predict the coupling fields of the dimers accurately. The convolutional neural networks (CNNs) based method is proposed to visually predict solutions to EM problems [27]. A CNN model was designed and trained to map electric fields (E-fields) with low precision to those with high precision. The U-Net combined with residual architecture was designed to predict E-fields of nanoparticles with different shape [28], the architecture of U-Net was modified as an encoder-decoder structure to obtain a higher accuracy.
This study has three main contributions. (1) A deep learning model combined with a transposed convolutional neural network (TCNN) was designed to predict the E-field of a single nanoparticle. The average prediction error of this model was 4.4%. (2) A fully connected neural network (FCNN) model combined with the sample self-normalization (SSN) method was designed for E-fields prediction at the central section of a dimer. Compared to the traditional global normalization method, the average prediction error was reduced from 5.65% to 3.60%. For samples with a drastically changing E-field, the prediction error was reduced by an order of magnitude. (3) The feed-forward denoising convolutional neural network (DnCNN) [29] model was fine-tuned and used to predict the E-fields of nanosphere arrays. Mapping from a low-precision E-field to a high-precision E-field was achieved using the DnCNN model. The average prediction error was significantly reduced from 28.8% to 5.6% compared to the method proposed in [27]. The time consumption for the simulation and design process of large nanoparticles was reduced in this study. The work presented in this paper provides an effective solution for the E-fields of nanoparticles and has potential applications in the rapid analysis and design of nanostructures for SERS. Fig. 1. Simulation of single nanosphere. The material of the nanosphere is gold (Au), and the gray part indicated in the figure is the SiO 2 substrate. R represents the radius of the nanosphere, C and D are two observation surfaces, which are 1 nm from the nanosphere. M is another observation surface, which is the central section of the sphere. The distance between the nanoparticles and substrate is 1 nm [30]. The excitation source is a plane wave incident along −ẑ with a wavelength of λ, and the polarization direction is along the X-axis.

A. Electric Fields Prediction for a Single Nanoparticle
In this study, the E-fields were obtained using FDTD Solution software. The single-nanosphere model is shown in Fig. 1. The E-fields were obtained on the two observation planes (C and D with dimensions of 200 nm × 200 nm). A perfectly matched layer (PML) boundary was used around the simulation model. A symmetric boundary was used in the X-axis, and an anti-symmetric boundary was used in the Y-axis during the simulation to reduce computation time and expense. Table I lists the sampling parameters of the dataset. This dataset contained 12000 samples of an E-field diagram, and each sample contained 201 × 201 sampling points. The structural parameters were the input data of the DNN, and the E-field diagram represented the output data. The dataset was divided into a training set (64%), validation set (16%), and testing set (20%), and only 7680 samples were used for DNN model training. The scale of the dataset and dimensions of each sample affect the training speed. To reduce the training time consumption of the DNN, only 1/4 of the E-field diagram was taken as the prediction target; that is, the output dimensions were adjusted to 101 × 101. To improve the  prediction accuracy, the E-field value, instead of the quantized image, was used as the output data during the training process.
The transposed convolutional neural networks (TCNNs), also called fractionally strided convolutions or deconvolutions, where the forward and backward passes of a convolution is exchanged [31]. It has been widely used for upsampling graphs, such as semantic segmentation and super-resolution tasks. Different from the traditional CNN used to extract image features, TCNN is used to generate high-resolution images from lowresolution images [32]. For single nanoparticles in this study, the electric field needs to be predicted based on structural parameters. The mapping relationship from a few parameters to a two-dimensional image is very suitable for TCNN.
A TCNN layer was used as the main neural network (NN) layer. Combined with the fully connected (FC) and CNN layers, the TCNN model is shown in Fig. 2. This model contained one input layer, 12 hidden layers, and one output layer (auxiliary layers, such as batch normalization and flattening layers, were not considered). The input layer contained three neurons: the radius (R), wavelength (λ), and position of the observation surface (O). The output data were E-field diagrams. Except for the last layer, the activation functions of the other layers were rectified linear unit (ReLU) functions, and there was no activation function for the final layer. The mean square error (MSE) was used as the loss function and Adam [33] was used as the optimizer.

B. Electric Fields Prediction for the Dimer
The dimer model used for the simulation is shown in Fig. 3, and the observation surface (M in Fig. 1) shows an E-field diagram with 101 × 201 sampling points. Similar to the singlenanosphere model, PML boundaries were used around the model, and symmetrical boundaries were used to reduce the calculation consumption. Table II lists the sampling parameters of the dataset. The dataset contained 13530 examples, which were divided into a training set (64%), validation set (16%), Fig. 3. Simulation of the dimer. D is the distance between two nanospheres, and R is the radius. In the dimer model, the radii of the two spheres are equal. The incident plane wave is along −ẑ with a wavelength of λ. and testing set (20%). Each example contained three structural parameters (distance (D), radius (R), and incident wave wavelength (λ)) and an E-field diagram. The near-and far-field optical properties of dimers are strongly dependent on interparticle distance [35]. When the interparticle distance decreases to 1 nm, a prominent coupling mode appears [26]. Therefore, D = 1 nm was added to sample library, and the E-field intensity at the center of the dimer changed significantly. For D ≥ 2nm, the prominent coupling mode is not significant. To reduce the size of the dataset, the sampling step was set to 2nm, and the sampling range was set to 2 nm to 20 nm. Therefore, the DNN model described in this section began with a one-dimensional (1D) E-field and then expanded to a two-dimensional (2D) E-field.
The FC layers were used to build the DNN model (known as the FCNN model), as shown in Fig. 4. The model contained 11 hidden layers. The FCNN model used ReLU as the activation function, and there was no activation function in the final layer. The loss function was the MSE, and the training optimizer was Adam. The input parameters of the model were the three structural parameters of the dimer, and the output parameter was a 1D E-field containing 201 samples. The normalization method affects the performance of the DNN model. In this section, the global normalization and SSN methods are described for data preprocessing. The global normalization method is a classical linear normalization method based on the maximum and where x denotes the variable samples in the entire dataset and max(x) − min(x) denotes the normalization parameter. When using the global normalization method, the prediction errors of samples with drastically varying E-fields are larger (see III.B for the results and analysis). An SSN method was proposed in this study to solve this problem. The dataset was normalized based on the maximum and minimum values of each sample. After normalization, the maximum value for each sample was one, and the minimum value was zero. A similar data was more conducive to the training and prediction of the FCNN model (see III.B for the results and analysis). A model for predicting the maximum and minimum values of each sample (known as the anti-normalization model) is presented in this section. According to the input structural parameters, the normalized E-field predicted by the FCNN can be anti-normalized to obtain the actual E-field accurately.
It is feasible to combine the FCNN and anti-normalization models to predict the E-fields of dimers. A flowchart is shown in Fig. 5. During the training process, the E-fields were normalized by the SSN method. The maximum and minimum value of each example in the training set was used to train the anti-normalization model. The training set of the FCNN model is the normalized E-fields. During the prediction process, the structural parameters were input arguments of the FCNN and anti-normalization models. The normalized E-fields of the test set were obtained by the FCNN model, the maximum and minimum values of the E-fields were obtained by the antinormalization model, respectively. After anti-normalization, the actual E-fields of the test set were accurately obtained.
It should be noted that the identically and independently of the normalized dataset might be occur, while the anti-normalization model can restore the normalized electric-field to the actual electric-field, the identity can be guaranteed.  6. Structure of the nanosphere array, R is the radius, D is the distance, and all nanospheres have the same dimension parameters. A plane wave with wavelength λ is incident along −ẑ direction. The E-field of 276 × 276 sampling points is obtained on plane C defined in Fig. 1.
When an FCNN model for a 1D E-field is obtained, only a few improvements are required to obtain the 2D E-field. A coordinate parameter is added to the input parameters of the FCNN model, and a complete 2D E-field can be obtained via a series of 1D E-field predictions. The use of TCNN layers will directly get the whole 2D E-field diagram, however, TCNN layers are not used in this model. This is because the E-field at the center of the dimer is significantly higher than that at other locations, which cannot be accurately predicted by TCNN layers.

C. Prediction of the Electric Fields of the Nanosphere Array
Compared to a single nanosphere, the time consumption of full-wave simulations for the nanosphere array increases dramatically. Therefore, high-efficiency calculation methods for nanosphere arrays are particularly important. Table III shows the number of iteration steps and time consumption required to simulate the nanosphere array in Fig. 6 using the WNESA method [11] with different iteration solution precisions. The iteration residual in the iterative solver is 0.1 with low precision and 0.001 with high precision. The time consumption and number of iteration steps with low precision are significantly smaller than those with high precision. Therefore, a fast-mapping model with low to high precision can be designed to obtain an E-field with high precision without large time consumption. In this section, a mapping model based on the DnCNN model is presented. The proposed method exhibited significantly better performance  7. Structure of the feed-forward denoising convolutional neural network model. BN represents the batch normalization layer, k3 indicates that the side length of the two-dimensional convolution window is three, and f64 indicates that the number of output filters in the convolution is 64. The output of the CNNs is the residual of electromagnetic solutions. It is added to the low precision E-field to obtain the high precision E-field. for the simulation of nanosphere arrays with ill-conditioned numerical linear matrix equations [34].
The nanosphere array is shown in Fig. 6, and the dataset sampling parameters are listed in Table IV. Each set of data contained E-fields with low and high solution precision. The dataset was divided into a training set (720), validation set (180), and test set (120). Unlike the dimer, the observation surface of the nanosphere array is 1 nm above it (plane C in Fig. 1). Compared with the dimer, the E-field of the noansphere array is not sharply enhanced. Thus, in this model, the global normalization method is used to normalize the dataset. The DnCNN was widely used for image denoising, and in this study, it was fine-tuned to predict the E-field of nanosphere arrays. The low-precision E-field was the input of the model, and the high-precision E-field was the output. The structure of the DnCNN is shown in Fig. 7. The convolution module consisted of CNN, batch normalization, and activation function (ReLU) layers. Residual structures were used in the DnCNN to avoid overfitting in deep networks. The loss function was the MSE, and the training optimizer was Adam. Fig. 9. Electric fields from two samples not in the dataset evaluated using the FDTD and TCNN model. For (a)-(c), R is 11 nm, λ is 771.9 nm, the observation surface is located at 1 nm from the bottom surface of the nanosphere (plane D in Fig. 1), and the MNRE is 3.34%. For (d)-(f), R is 40 nm, λ is 493 nm, the observation surface is located at 1 nm from the top surface of the nanosphere (plane C in Fig. 1), and the MNRE is 9.83%.   11. Test result of generalization capability with different type of materials. For (a)-(c), R is 11 nm, λ is 502.5 nm, the observation surface is located at 1 nm from the top surface of the nanosphere (plane C in Fig. 1), and the MNRE is 4.74%. For (d)-(f), R is 39 nm, λ is 775 nm, the observation surface is located at 1 nm from the bottom surface of the nanosphere (plane D in Fig. 1), and the MNRE is 7.41%.

III. RESULTS AND DISCUSSION
In this study, the mean norm relative error (MNRE) was used to evaluate the error between the predicted and simulated results (2) The DNN models in this study were all trained using Keras. The computer used for DNN model training was an Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz, 256 GB of memory, and two NVIDIA GeForce RTX 2080 Ti graphics cards.

A. Results of the Single Nanoparticle
The dataset of single nanoparticles mentioned in Section II-A was used for TCNN model training. For the TCNN model, during the training process, the batch size was 32 and the training epoch was 800. To prevent overfitting, the early stopping function was used. The training process was stopped in steps of 158, and the model parameters were restored to the situation in steps of 108 to obtain the best performance. A learning rate reduction mechanism was employed to improve the training performance of the model. If the loss function value did not change significantly within 30 steps, the learning rate was reduced by 50%. Fig. 8 shows the history of the training process. The MNRE of the test set was 4.4%. To demonstrate the generalization ability of the model, the E-fields of the two groups of comparison results not in the dataset are shown in Fig. 9. The MNRE was consistently below 10%, proving the  accuracy of the TCNN model for E-field predictions of a single nanosphere.
Without loss of generality, in order to investigate the generalization capability of the proposed method for different types of nanoparticles. The sampling of Cu nanoparticles is added to the single nanoparticle training set. The real and imaginary parts of the index are added to the input layer parameters. The network model was trained with the same structure. Two types of materials similar to Cu and Au (Cu-Palik and Au-Palik) is used to test the model generalization ability. The index parameters of these materials are shown in Fig. 10. We choose two representative samples from the test set as shown in Fig. 11. The MNREs of the samples are 4.74% and 7.41%, which are at the same level as the samples in the range of the training set. The test results confirm that the model has generalization capability in terms of different types of materials, the generalization ability can be achieved by adding the sampling of different parameters in the dataset, rather than redesign the network model.

B. Results of the Dimer
For the dimer, the dataset was normalized using the global normalization method firstly. The FCNN model was trained using dimer data, and the learning rate decline mechanism was also added in the training process. The training history is shown in Fig. 12(a). After training, the global maximum and minimum values recorded in advance were used to anti-normalize the predicted results. The error of the model was analyzed using samples from the test set, and the MNRE was 5.65%.
The FCNN model using the global normalization method could accurately obtain the E-field; however, the error on several samples was significantly large, especially for samples with large radii and short distances. As shown in Fig. 13(a) and (b), the MNRE of the two test samples were 15.26% and 22.78%, respectively.
Because the E-fields at the center of the dimer vary drastically, they cannot be accurately predicted by the FCNN model using the global normalization method. As in (1), the normalization Fig. 16. Results of the electric field from the nanosphere array predicted by the denoising convolutional neural network model. The MNRE of (a)-(c) is 3.05%, and the original MNRE between the low and high-precision E-field is 11.92%; The MNRE of (d)-(f) is 18.56%, and the original MNRE between the low and high-precision E-field is 35.85%.
parameter of the global normalization method was determined by the maximum and minimum values of the entire dataset. However, as shown in Fig. 14(a), only 19.19% of the samples exhibited a maximum value greater than 10 V/m. Consequently, most of the samples were compressed to a small value after global normalization, as shown in Fig. 14(b). This is extremely difficult for training the FCNN model, which considers samples with extremely large E-fields as abnormal samples. There will be large errors in the prediction of such abnormal samples, and the prediction error will be further amplified by anti-normalization. With the proposed SNN method, the normalized data are distributed uniformly between zero and one, as shown in Fig. 14(c), which effectively improves the accuracy of E-field prediction. The FCNN model using the SSN method was trained with the same data, model structure, and training process. The training history is shown in Fig. 12(b). The model was tested using the same test set used in the global normalization method, and the MNRE was reduced to 3.60%.
To validate the advantages of the SNN method more clearly, Fig. 13(c) and (d) show the results compared with the global normalization of Fig. 13(a) and (b) using the same neural network parameters. The errors were reduced from 15.26% ( Fig. 13(a)) to 4.60% (Fig. 13(c)) and 22.78% (Fig. 13(b)) to 1.50% (Fig. 13(d)). Two samples were randomly selected for testing, and the prediction results of the 2D FCNN model are shown in Fig. 15. The dimer in Fig. 15(a)-(c) had a small distance and large radius. SERS at the center of the dimer became significantly stronger. However, with the SSN method, the error was only 7.18%, and a smaller error of 4.49% was achieved for Fig. 15(d)-(f) with weak couplings.

C. Results of the Nanosphere Array
With low-precision E-fields as the input and high-precision E-fields as the output, the DnCNN was trained using the nanosphere array dataset. The batch size was eight and the training epochs was 1,000.
To demonstrate the superiority of the DnCNN model, the surface current estimation model in [27] was employed for E-field prediction. In contrast to the model in [27], the residual network in the DnCNN model can improve training efficiency and prediction performance [29]. The performance of the two models is compared in Table V. A significantly lower error was obtained for the DnCNN model than the model in [27]. The MNRE of the test data was reduced from 28.8% to 5.6%, and the MSE was reduced from 0.087 to 0.0047. To prove the generalization ability of the model, 204 samples (20% of the initial dataset) were randomly generated outside the dataset for testing, and two results were selected, as shown in Fig. 16. Fig. 16(a)-(c) show the results with the smallest error (MNRE was 3.05%); Fig. 16(d)-(f) show the results with the largest error (MNRE was 18.56%).

D. Time Consumption
The time requirements of different DNNs are summarized in Table VI. For the time of predicting one electric field, the prediction time of DNNs is much smaller than that of full-wave simulation. Though the total training time of DNNs for dimmers and nanoparticle arrays is much longer than the time of full-wave simulation for one electric field, the advantages of the DNNs will appear when for batch predictions in optimizations.

IV. CONCLUSION
In this study, E-field prediction for different nanostructures based on the deep learning method was investigated. For the single-nanosphere model, a TCNN model was designed to predict the E-fields of a single nanoparticle. For dimers, an FCNN model combined with the SSN method was designed, which had a significantly higher accuracy than the global normalization method, especially for drastically varying E-fields. For the nanosphere array, the DnCNN model was developed with better performance than that in [27] to obtain a highprecision E-field from a low-precision E-field. In addition, the generalization capability in terms of materials of single nanoparticles is discussed in this study. The generalization capability of dimers and nanoparticle arrays can also be obtained based on the same method. For the generalization ability in terms of different shapes of the nanoparticles and different observation surface, the parameters of different shapes and observation surface should be added in the input parameters of the training data set. The potential applications of this study can be employed for modeling and designing SERS from nanoparticles.