Hybrid Physics-Informed Neural Network for the Wave Equation With Unconditionally Stable Time-Stepping

This letter introduces a novel physics-informed approach for neural network-based 3-D electromagnetic modeling. The proposed method combines standard leap-frog time-stepping with neural network-driven automatic differentiation for spatial derivative calculations in the wave equation. This methodology effectively addresses the challenge of accurately modeling high-frequency electromagnetic fields with physics-informed neural networks, often characterized as “spectral bias,” in the time domain. We demonstrate that the resultant numerical scheme enables unconstrained time-stepping with respect to stability, in contrast to the finite-difference time-domain method, which is subject to the Courant stability limit. Furthermore, the use of neural networks allows for seamless GPU acceleration. We rigorously evaluate the accuracy and efficiency of this finite-difference automatic differentiation approach, by comprehensive numerical experiments.

Abstract-This letter introduces a novel physics-informed approach for neural network-based 3-D electromagnetic modeling.The proposed method combines standard leap-frog time-stepping with neural network-driven automatic differentiation for spatial derivative calculations in the wave equation.This methodology effectively addresses the challenge of accurately modeling highfrequency electromagnetic fields with physics-informed neural networks, often characterized as "spectral bias," in the time domain.We demonstrate that the resultant numerical scheme enables unconstrained time-stepping with respect to stability, in contrast to the finite-difference time-domain method, which is subject to the Courant stability limit.Furthermore, the use of neural networks allows for seamless GPU acceleration.We rigorously evaluate the accuracy and efficiency of this finite-difference automatic differentiation approach, by comprehensive numerical experiments.
However, there are two fundamental problems in conventional PINN formulations [15].First, there can be a notable discrepancy in the convergence rates among various terms of a PINN loss function.Wang et al. [16] demonstrated that the gradient of different loss terms (i.e., "domain loss," "boundary loss," and "initial loss") in PINN models become unbalanced for partial differential equation (PDE) solutions exhibiting high-frequency and multiscale characteristics, often resulting in training failure.Second, PINNs suffer from "spectral bias" [17], [18], which represents a frequently observed phenomenon in deep fully connected networks impeding their ability to learn high-frequency functions.These problems limit the applicability of PINNs to general time-domain electromagnetic modeling, where highfrequency features are commonly encountered.
To address these challenges, we introduce a hybrid approach combining finite difference (FD) and AD techniques for training PINNs.Specifically, standard finite differences are employed for the time updates of electromagnetic field components, while AD is applied to the spatial derivatives of the wave equation.In contrast to PINNs that rely on fully connected neural networks [19], [20], the hybrid FD-AD approach employs convolutional neural networks, enabling simultaneous training of all sampling positions at a given time step.In addition, it allows for stacking successive time steps during the training stage (see Fig. 1).These characteristics significantly enhance training efficiency, compared with conventional PINNs.Moreover, we show that this method allows for unconstrained time-stepping, unlike the finite-difference time-domain (FDTD) method, which is subject to the Courant-Friedrichs-Lewy (CFL) limit.

II. METHODOLOGY: FD-AD
We train a U-Net [21] to solve the wave equation for the electric field where ε is the permittivity, μ is the permeability, and E is the electric field vector.By approximating the second-order partial derivative of the electric field with respect to time in (1) with a centered FD, and using AD for the partial derivative of the electric field in the spatial domain, the update equation for the electric field becomes where Δt is the time step, ∇2 is the AD-based Laplacian operator computed by the neural network, E n+1 , E n , and E n−1 represent the electric field at the n + 1, n, and n − 1 step, respectively.The left-hand side of (2) can be used to optimize the prediction of the electric field for the n + 1 step.Considering (2) at N consecutive time steps Then, we derive the following loss term to replace the mean square error for the computation domain For the Dirichlet boundary condition, the corresponding boundary loss term is where N b is the total number of sampling points at the boundary and E i b,j represents the corresponding electric field.A unit amplitude Gaussian pulse is chosen as the excitation source in this study.Training for the hybrid PINN begins at the end of the Gaussian pulse duration, at the time step when the Gaussian decays to exp(−9).The input of the U-Net consists of the electric field within the computational domain at that time step.Since the input already incorporates the initial condition into the U-Net, the loss function of the physics-informed U-Net includes only the loss for the electric field within and on the boundary of the computational domain The U-Net optimizes its parameters to decrease L PINN through back-propagation during the training process.By replacing AD with finite-differences for time-stepping, we overcome a common challenge of PINNs, which is that the gradient of the loss function tends to vanish when it is computed over small time scales.As a result, our proposed hybrid PINN structure can effectively model high-frequency phenomena without the limitations imposed by this vanishing gradient problem in conventional PINNs.A schematic diagram illustrating the proposed procedure is shown in Fig. 1.From Fig. 1 and the loss function L r in (4), it can be observed that the proposed hybrid PINN combining finite-differencing in time and AD in space (FD-AD), is capable of simultaneously predicting a batch of N time steps.We define N as the "batch size," i.e., the number of time steps in each batch.Furthermore, the time step within each batch is referred to as the "division time step" (Δt FDAD ).Then, an effective time step, representing the time step for each round of the FD-AD prediction, is defined as follows: In the subsequent sections, we will systematically benchmark the performance of the hybrid PINN, illustrating the role of N and Δt FDAD , using 3-D inhomogeneous cavity simulations.

III. ACCURACY AND EFFICIENCY ANALYSIS OF THE HYBRID PINN METHOD
In this section, we examine the influence of two critical hyperparameters, namely Δt FDAD and N , on the accuracy and efficiency of the hybrid FD-AD scheme.We show that Δt FDAD is subject to the Nyquist limit for sampling the simulated field waveforms, rather than a CFL-type stability limit.
The FD-AD-based hybrid PINN method is implemented using a modified U-Net architecture [21].The conventional U-Net structure is primarily designed for 2-D data.In this study, we extend the U-Net framework by incorporating 3-D convolutional kernels to address the specific requirements of modeling wave propagation in 3-D.
The U-Net structure is shown in Fig. 2, which consists of two branches: a down-sampling route and an up-sampling route.There are five corresponding blocks in these two branches.In each block, there are two 3 × 3 × 3 convolution filters with a rectified linear unit (ReLU) activation function.The padding size of these two convolutional layers is 1 on each boundary, and the step size of the convolutional kernels is also 1. So, the output dimension does not change after each block.In the down-sampling route, there is a max-pooling layer between two neighboring blocks, which can decrease the size of feature maps.The kernel size of the max-pooling layer is 2 × 2 × 2. Therefore, after each down-sampling block, the feature map is halved in each dimension.In the up-sampling route, there is a transposed convolutional layer between each block.The kernel size of this layer is also 2 × 2 × 2. So, after each block, the size of the feature map is doubled in each dimension.
Finally, between the down-sampling route and the upsampling route, there is a skip connection between each corresponding block.In Fig. 2, we can see that the skip connection copies the output from one down-sampling route to the corresponding block in the up-sampling route directly.This direct connection keeps different level features in the training process.
Fig. 3 shows the simulated cavity.In the experiments, 20 sets of data are used to train the physics-informed U-Net and another 20 sets are used for testing purposes.The shape of the cavities ranges from [20,20,40] to [30, 30, 60] mm for the training data, and [30, 30, 60] to [40, 40, 80] mm for the testing data.The cavity is divided into two equal regions, occupied by different dielectrics.The relative dielectric permittivities of these regions are in the range of [1,3] for the training data and [1,5] for the testing data.The cavity is uniformly sampled with a density of 32 × 32 × 64 points in the x-, y-, and z-direction, respectively.A Gaussian source is placed at the position (16,16,5) to excite fields within the cavity.
For validation purposes only (i.e., not for training), we use an FDTD solver to generate ground-truth data for the testing set.In the FDTD simulation, the cavity is divided into 32 × 32 × 64 cells in the x-, y-, and z-direction, respectively.The cell sizes Δx, Δy, and Δz are in the range of 0.625-1.250mm.The relative error (E rel ) for each division time step is computed as where N x , N y , and N z are the numbers of cells in each direction, E p is the predicted electric field from the PINN, and E r is the electric field computed by FDTD.

A. Accuracy and Runtime as a Function of the Division Time Step
The division time step represents the time step within each batch of the hybrid PINN prediction.In this benchmarking analysis, the division time step factor (m) correlates Δt FDAD to the CFL limit of an FDTD method applied to the same mesh and is defined as follows: In ( 9), Δt FDAD is the division time step of the FD-AD-based hybrid PINN, and Δt FDTD is the maximum time step allowed by the CFL limit in FDTD, considering the aforementioned spatial discretization.The batch size is set to N = 1.For each value From Fig. 4, it is evident that the mean relative error of the FD-AD prediction remains at a low level until m = 5.8, i.e., well exceeding the stability limit of the corresponding FDTD method.Beyond this threshold, fields remain stable, yet the relative error increases dramatically.To further explore this increase, we compute the frequency spectrum of the fields at the position of (15,15,31) for both FD-AD and FDTD of a testing case.The magnitude of the Fourier transform of the electric field is denoted as Ẽp and Ẽr , for FD-AD and FDTD electric field, respectively.Fig. 5 shows that the predictions obtained by FD-AD are in good agreement with the FDTD results for frequencies up to 30 GHz.Then, we compute the number of sampling points, N T , within a single period as a function of frequency.For a specific frequency point f i , N T = 1/Δt FDAD /f i .Note that Δt FDAD = mΔt FDTD , where Δt FDTD depends on the cell size of the FDTD simulations and permittivities of cavities, with a minimum of 1.29 × 10 −12 s and a maximum of 1.68 × 10 −12 s in the testing set.Therefore, N T varies within a range for a fixed m in different testing cases, even at a fixed frequency point.Besides, the mean relative error between Ẽp and Ẽr over the testing set, is defined as follows: where N test is the number of testing cases.The range of N T is shown in Fig. 6, along with the mean relative error (10) of all testing data.Through this comparison, it is evident that the relative error (10), increases significantly when N T approaches the Nyquist limit of two points per period.In summary, this study shows that Δt FDAD is not subject to a CFL-like stability limit, but rather to the Nyquist limit alone.Therefore, as m increases, more frequency components of Ẽ exceed the frequency threshold associated with the Nyquist limit.At m = 6.0, this phenomenon becomes pronounced, leading to the rapid increase of the training error observed in Fig. 4. In addition, we also observe small fluctuations of the mean relative error in the 0-30 GHz range in Fig. 6.The noted error peaks correspond to F rel < 3.3% and represent minor variations in the accuracy of the trained PINN as it determines fields at versus in-between resonant frequencies.

B. Accuracy and Runtime as a Function of the Batch Size
The second parameter under consideration is the batch size (N ), which determines the number of time steps in each output of the hybrid PINN.In the following experiments, the division time step factor is set to m = 5.8, which is the maximum division time step determined by the previous benchmark experiments.To find the largest effective time step (T eff ), we let N > 1 and measure the mean relative error of the training and testing database and relative training time as N varies.
The results in Fig. 7 show that a gradual increase in the batch size leads to a slight increase in the relative error up to N = 14.At that point, for a U-Net with 5 level blocks, T eff exceeds the maximum time step allowed by the CFL limit in the corresponding FDTD simulation, by a factor of 81.2.Furthermore, we extend the complexity of the U-Net by incorporating an additional block to investigate its capacity to accommodate a larger batch size.The results are also included in Fig. 7, which indicates that the maximum feasible batch size reaches N = 22  for the U-Net with six-level blocks.Hence, the increase in the relative error observed for the five-level block U-net was related to the capacity of the U-net and can be controlled by adding more levels.However, it is important to note that the increased U-Net complexity results in a notable increase in the training time.
The training of the hybrid PINN with N = 14 and m = 5.8 takes 37.8 min on an Nvidia A100 GPU.After the PINN is trained, the execution time of the proposed hybrid PINN is 0.17 s for each case on average, on an Nvidia 3060 GPU, and 1.68 s on an Intel i-5 CPU.In contrast, the FDTD simulation requires 3.49 s on the same CPU.Fig. 8 depicts the results of two testing cases.The size of these cavities is 35 mm × 35 mm × 70 mm.

IV. CONCLUSION
This letter has made two key contributions to scientific machine learning for computational electromagnetics.First, by combining FDTD-type leap-frog time-stepping with standard PINN-based AD in the space domain, we produce a 3-D numerical scheme that can effectively model high-frequency phenomena overcoming the well-known limitations of standard PINNs.Second, we showed the time-stepping of the proposed scheme is not subject to any CFL-like stability limit.Further study indicated that the time step is only subject, as expected, to the Nyquist limit.The two contributions create new possibilities for the PINN-based solution of realistic 3-D electromagnetic problems.

Fig. 1 .
Fig. 1.Time-stepping of electromagnetic fields with the proposed hybrid PINN methodology combining FD in time and AD in space (FD-AD).

Fig. 2 .
Fig. 2. Architecture of the U-Net employed in the hybrid PINN.

Fig. 4 .Fig. 5 .
Fig. 4. Mean relative error and relative training time as a function of the time step factor.

Fig. 6 .
Fig. 6.F rel and the corresponding N T over the testing set.

Fig. 7 .
Fig. 7. Mean relative error and relative training time as a function of N .