Modeling of Coupling Effects in Neural Networks for Ship Motion Prediction

The prediction of ship motions is vital for ship maneuvering and operations at sea, but the prediction accuracy is seriously affected by nonlinear and time-varying ship dynamics. Although various types of neural networks have been used to overcome these problems, the coupling effects in ship motions lack explicit and effective modeling, thereby limiting the prediction accuracy. To address this challenge, this study proposes a novel neural network structure that can explicitly model the coupling effects while remaining frequency-aware and sequence-aware. In the proposed structure, factorization machines are utilized to model the coupling effects and reduce dimensionality. Wavelet transform is used to convey the frequency information, and the recurrent neural network (RNN) structure is applied for online prediction. Experimental results show that with the proposed structure, the root-mean-squared errors of the prediction of roll, pitch, and heave during the 10 seconds ahead are reduced by 5.24%, 3.71%, and 8.71%, respectively, compared with the values for simple RNN models. This ﬁnding veriﬁes the outstanding performance of the proposed method despite its low time and space complexity. This study also proposes a multi-stage training trick that can help improve the training process. With this trick, the proposed structure can be applied to newly proposed RNN models as an improvement, as veriﬁed through experiments. The interpretability of the proposed method is revealed via visualization, which indicates that the proposed method can help build white-box neural network models for ship motion prediction.


I. INTRODUCTION
Ship motion prediction is a vital topic in ship maneuvering and ocean engineering [1]. The control performance of ships maneuvered through adaptive state feedback control strategies can be improved by decreasing the prediction errors of the system state [2]. Meanwhile, the planning and control performance of robotic manipulators on seaborne platforms are affected seriously by the accuracy of ship motion prediction [3] [4]. For the deployment and landing of air vehicles on board, reliable prediction of ship motions is urgently required to ensure the stability and effectiveness of the shipboard landing control system [5].
However, accurate ship motion prediction is difficult for two main reasons. First, ship dynamics are time-varying [4] and nonlinear [6]. Time-varying environmental factors, such as waves [7], currents [8], and winds [9], can directly affect ship motions. Second, ship motions are highly coupled [10] and have chaos characteristics [11]. Given these factors, traditional parameter-fixed models, such as those based on Kalman filters [12] [13], and linear models, such as autoregressive (AR) models [5], cannot easily achieve high accuracy.
Neural networks are regarded as a common approach for dealing with the limitations of the above-mentioned methods. The effectiveness of neural networks for time-series prediction tasks in other fields, such as energy efficiency [14], multi-robot systems [15], and autonomous electric vehicles [16], has been validated. Neural networks, such as simple backpropagation neural networks [17], recurrent neural networks [18], radial basis function neural networks [1] [ 19], echo state networks [20], extreme learning machines [21], and hybrid models [22] [23], have also been widely utilized in ship motion prediction.
Given that the attitude of ships varies quasi-periodically, frequency information is also useful for improving the performance of ship motion prediction methods. Wavelet transform is a multiresolution analysis method with time-frequency localization [24], and the combination of wavelet transform and neural networks can help build frequency-aware models [25]. Three kinds of combination can be used: 1) to use wavelet functions as activation functions [26] [27], 2) to apply wavelet decomposition directly as a feature engineering tool [28] [29] [30], and 3) to embed wavelet decomposition into neural networks [25].
Despite being frequency-aware, all of the models mentioned above fail to model the coupling effects explicitly. The non-linear activation functions in neural networks may result in the coupling of the inputs, but such black-box coupling is not controllable and reliable; thus, the model cannot easily learn the coupling effects. Moreover, models with high complexity are time-consuming and unsuitable for real-time ship motion prediction, but embedding wavelet decomposition into neural networks by allocating independent parallel neural networks to each sub-series then fully connecting them lead to considerable extra complexity.
Given that the coupling effects can be expressed in the form of multivariate Taylor series expansion, and the terms beyond the second order can be omitted [31], the coupling effects can be modeled by the product of two degrees of freedom (DOFs). Ship motion has six DOFs, and the number of products is C 2 6 = 15 in total. For a frequency-aware model, however, the application of wavelet decomposition significantly increases the dimensionality of the input space; the number of products also increases dramatically, leading to the curse of dimensionality [32]. For example, if we decompose each DOF to six dimensions, we will need C 2 36 = 630 dimensions to represent the products.
Thus, we cannot directly import product terms into the input space after wavelet decomposition. Instead, we use factorization machines (FM) to model the coupling effects. FM models can factorize the coefficient matrix of the products into a low-dimensional matrix to avoid the curse of dimensionality [33]. For high-order and nonlinear modeling, we also need to subsume FM under the neural network framework, resulting in neural factorization machines (NFM) [34]. In this study, the recurrent neural network (RNN) framework is utilized because RNN models are particularly good at capturing sequential features and structurally friendly to online prediction [35].
To model the coupling effects effectively while maintaining frequency and sequence awareness, this study proposes a neural network structure called multi-resolution recurrent neural factorization machines (mRNFM). In mRNFM, wavelet decomposition is used to decompose the initial inputs to sub-series. FM is utilized to model the coupling effects in low-dimensional space, and it is embedded into the RNN framework. The mRNFM model is expected to be frequency-aware and lightweight and able to demonstrate the coupling effects.
The remainder of this paper is organized as follows. The methodology is presented in Section II. The experimental results of the proposed mRNFM model are shown in Section III, and the discussion is given in Section IV. The conclusions are presented in Section V.

II. METHODOLOGY
The proposed mRNFM model is specially designed to model the coupling effects while staying frequency-aware and sequence-aware for ship motion prediction. The proposed model adopts the 6-DOF ship motion model for physical modeling, improved wavelet decomposition to stay frequency-aware, and FM in the RNN framework to model the coupling effects without the curse of dimensionality. The methodology is described as follows.

A. Ship Motions with Six Degrees of Freedom
A 6-DOF model is often built for the study of ship motions, and the ship is assumed to be a rigid body [31]. The DOFs are roll, pitch, yaw, surge, sway, and heave. The coordinate system moves with the ship. The X-axis is the longitudinal coordinate (positive forward), the Y-axis is the transverse coordinate (positive to starboard), and the Z-axis is determined by the right-hand rule. The origin is the installation position of the inertial navigation system (INS). Roll, pitch, and yaw are the rotational DOFs of the X-axis, Y-axis, and Z-axis, respectively. Similarly, surge, sway, and heave are the translational DOFs of the X-axis, Y-axis, and Z-axis, respectively, as is shown in Fig. 1. Roll is denoted by ϕ, pitch is denoted by θ, yaw is denoted by ψ, surge is denoted by x, sway is denoted by y, and heave is denoted by z.
The values of roll, pitch, and heave directly determine the ship motions. By contrast, the values of yaw, surge, and sway make little sense. For example, a single value of yaw contains little information without knowledge of the directions of waves, winds, and currents. However, the rate of yaw matters because it contains information on motion tendency, from which the neural network can learn about the environmental factors. The rate of surge and sway also matters, although the values themselves are of little significance. Therefore, in this study,ẋ,ẏ, z, ϕ, θ, andψ are used to describe ship motions.
Ship dynamics can be generically expressed as where d i refers to the i-th DOF and f i is a complicated function. With the form of a multivariate Taylor series expansion, (1) becomes where w 0,i is the bias coefficient for the i-th DOF, w j,i contains the linear coefficients, w jk,i refers to the second-order coefficients, and h i (d) pertains to the high-order terms with relatively small values [31]. The second-order coefficients correspond to the coupling effects. These coefficients can be regarded as constants in a short period, because the dynamics of heavy objects, such as ships, change slowly.

B. Multilevel Wavelet Decomposition
Multilevel wavelet decomposition is an approach for numerical and functional analyses, and it is based on discrete wavelet transform (DWT) [36]. In time-series prediction, each prediction utilizes the data in the last sliding window. If the transformation is time-variant, the results of the transformation for adjacent windows will be discontinuous, and the output of the neural network cannot be guaranteed to be continuous. Therefore, time-invariant stationary wavelet transform (SWT) [37] is adopted in this study.
SWT is defined by a low-pass filter H, a high-pass filter G, and an upsampling operator S.
The low-pass filter H is defined by a doubly infinite sequence and the action of H on a doubly infinite sequence {x n } is defined as In this study, {h n } is assumed to be compactly supported so that the summary in (4) is finite.
Specifically, for Haar wavelet, the non-zero elements are Similar to the low-pass filter H, the high-pass filter G is defined as G is compactly supported when H is compactly supported. The support of G is the same as the support of H.
The operator S is defined as Filters H [r] and G [r] are set to have weights S [r] h and S [r] g, respectively. Then, the output of standard SWT is determined from the decomposition of sequence x as follows: where J is the level of SWT, A is for approximation, and D is for the details. For a finite-length sequence, periodic extension is used at the boundaries to perform computation. Assuming that {x n } is of length l, the lengths of A i and D i are all l (i = 1, 2, . . . , J). Here, the relation 2 J | l needs to be satisfied. Due to the upsampling operator S, the energy of sequence A i or D i is defined as where {v j } pertains to the elements of A i or D i .
For time-series prediction tasks, data in the past are known, but data in the future are unknown.
However, in standard SWT, data in the future are involved in the computation, and they are represented by periodic extension, which can cause errors. These are called boundary effects. The left boundary has boundary effects, but the periodic extension here can be replaced by past data. Therefore, the filters need to be improved by the translation where i+1 is the right support of H and G. With the translation, all of the filters have non-positive support so that the boundary effects are translated to the left boundary and then eliminated by known data in the past.

C. Multi-resolution Recurrent Neural Factorization Machines
The mRNFM model can be viewed as a variant of NFM, and NFM utilizes FM for biinteraction pooling of the embedded features [34]. FM estimates the output y by linear combination and factorized interaction of the input vector x as follows: where the estimated model parameters are The mRNFM model is designed specifically for ship motion prediction to improve the prediction accuracy. For ship motion data, RNN is a better choice for capturing sequential features than common DNNs [35]. Thus, mRNFM replaces the DNN structure in NFM with RNN. SWT is also applied before FM for frequency awareness. Moreover, the DNN structure in NFM only works on the bi-interaction term of one FM, but the RNN framework in mRNFM works on all of the terms in (11) for multiple FM. The reason is that mRNFM is designed for ship motion prediction with dense data, whereas the NFM is designed for sparse predictive analytics of big data. The entire structure of mRNFM is shown in Fig. 3. After these SWT procedures, we derive many sub-series of size m and their initial sequences.

1) Input Vectors and Sub
The informative and non-redundant ones are selected, and the elements in different sequences annotated by the same subscript are combined as a vector. Then, we obtain the high-dimensional sequence s t−m+1 , . . . , s t .
and each FM has the same k but different values of w 0 , w, and v. The gradient of z i to parameter θ is as implemented in libFM [38]. The subscript r refers to the r-th FM.
The multiple FM layer can be activated. The activation function is denoted by σ z (·), and then (13) becomes The gradient is redefined according to the chain rule.
3) Hidden Layers: The hidden layers are all RNN layers, and the sequence {z i } m is the input of different time steps. For each layer, the output P q · is a function of the input P q−1 · and the output of the previous time step: For simplification, define P 0 i = z i , and P q 0 is initialized before calculation. sz(q) is set as the length of P q i . A simple RNN (sRNN) layer is formulated as where σ q is the activation function. The parameters to be estimated are:

4) Output Layer:
The output layer and the hidden layers are designed to be homogeneous.
For each time step, the output corresponds to the predicted values of the following n steps. The length of the output is set as the prediction length n, and the output of the last time step is the predicted result of the model.

5) Learning:
The mean square error (MSE) loss is taken as the loss function as follows: where n is the length ofŷ and y.
The RMSProp optimizer [39] is used in this study. The gradients are calculated with the BPTT algorithm [35].

6) Online Prediction:
The mRNFM structure is friendly to online prediction because of the time-invariant modified SWT without boundary effects and the RNN structure inside. Only onestep calculation is needed for each new prediction, as presented in Fig. 4. Online training can be performed at given frequencies to help model the time-varying dynamics.
D. Relation to Existing Models 1) Relation to NFM and Other FM-related Models: NFM and other FM-related models are usually used in recommendation systems for sparse data analytics, whereas mRNFM is specially designed for ship motion prediction. Despite being used in diffenent fields, FM in these models plays the same role: dimensionality reduction. In mRNFM, FM also directly models the coupling effects.
2) Relation to Different RNN Models: mRNFM utilizes the RNN framework, and different RNN models can all work well in mRNFM. In this study, the sRNN model is used for demonstration.

III. EXPERIMENTAL RESULTS
The proposed mRNFM model is applied to deal with the data collected from a ship maneuvering test. The sampling frequency of the data is 10 Hz, and the length of the data is 5 hours.
Several statistical parameters are shown in Table I, with the mean values of roll ϕ, pitch θ, and heave z set to zero.
A. Preprocess Table I shows that all of the features, except for the rate of surgeẋ, can be viewed as zero-centered with their range in the same order of magnitude. For a sailing ship,ẋ is usually positive because it indicates the sailing velocity. However, for improved performance of the neural network, we need to normalizeẋ as follows: where M ean(ẋ) = 6.34 and Std(ẋ) = 0.357 according to Table I. Then, modified SWT is applied to estimate the features, and the choice of level is important for effective and non-redundant representation. For the selection of level J, the following criteria need to be satisfied.
1) The high-frequency part that has low energy and little information but much noise should be separated then discarded.
2) The energy of A J or D J should be moderate so that the frequency domain is fully separated without too much redundancy.
3) The energy of approximation A J should be larger than the energy of the details D J .
Haar wavelet [40] is utilized in this study to achieve a simple but effective implementation. For each feature, the energy of A i and D i for i = 1, 2, . . . , 7 is computed then divided by the energy of the raw data, as shown in Fig. 5.
According to Fig. 5, level J = 5 is a good choice for each feature in consideration of the

B. Ablation Study
An ablation study is conducted through an experiment to prove the efficiency of mRNFM.
The following types of neural networks are implemented: 1) mRNFM: proposed in this study; 2) RNN-36: mRNFM without the multiple FM layer; 3) RNFM-6: mRNFM without the SWT process; 4) RNN-6: mRNFM without the SWT process and the multiple FM layer. Among the models above, RNN-36 takes the 36 sub-series as the input of a common RNN model, RNFM-6 takes the 6 initial sequences as the sub-series, and RNN-6 takes the 6 initial sequences as the input of a common RNN model. The network parameters are listed in Table   II

D. Metrics and Results
The metrics are the RMSE of the prediction of roll, pitch, and heave during the 10 seconds ahead. The number of trainable parameters in the different models are presented. The results are shown in Table III. According to the test results, the proposed mRNFM model achieves the best performance, although the number of trainable parameters is slightly larger than that of others. The test results indicate that the combination of multilevel wavelet decomposition and FM provides improved predictions, but using only one of them cannot ensure a small loss.

E. Multi-stage Training Trick
During training, the optimizer may converge to a relatively high loss because the bi-interaction terms in the multiple FM layer can affect the data distribution, which is usually maintained by batch normalization [41] in common DNN models, leading to poor gradients. For instance, if x ∼ N (0, 1), x 2 does not follow the Gaussian distribution, and it can not be simply dealt with by reducing the internal covariate shift because the distribution is skewed.
Such a problem can be solved by multi-stage training. According to (11), the multiple FM layer degenerates to a full-connection (FC) layer when the parameter v of each FM is set to zero.
At the same time, the gradients of v become zero according to (14), structure in MSA-LSTM is maintained, but the SWT layer and the output shape are changed to be consistent with the models in the ablation study.   Fig. 10.
For different models, the strength may change greatly because the objectives of prediction (roll, pitch, or heave) are different. Strong coupling effects may exist for several pairs, but Moreover, aside from coupling DOFs, coupling frequencies can also be observed in Fig. 10.
For example, parts D 2 and D 3 of yaw motion can strongly affect the prediction of roll, and parts D 2 and D 3 of surge motion can affect the prediction of heave.

C. Time and Space Complexity
For the SWT part, the time complexity is equal to that of 2J (J is the level of SWT) times of 1D convolution, and the same applies to the space complexity. Therefore, compared with the complexity of neural networks, the complexity of SWT can be ignored.
For each FM, the computation and parameter updates are both O(kn) [33], where n is the number of sub-series. The space complexity of each FM is O(kn + n + 1) according to (12).
For the multiple FM layer, the time complexity is O(Rkn), where R is the number of FM units, and the space complexity is O (R(k + 1)n).
For a sRNN layer with R units and n-dimensional input, the time complexity is O(R 2 + Rn), and the space complexity is O(R 2 + Rn). Therefore, the complexity of the multiple FM layer is close to the complexity of a sRNN layer when k = 2, n = 36, and R = 72 (parameters in the ablation study).

V. CONCLUSION
The accuracy of ship motion prediction by traditional neural networks is limited by the coupling effects in ship motions. To solve this problem, this study explicitly models the coupling effects of ship motions in neural networks while remaining frequency-aware and sequence-aware.
A novel mRNFM structure that combines multilevel wavelet decomposition, FM, and RNN is proposed for the neural network algorithm. SWT is optimized and applied to decompose the sequences into different sub-series, thereby making the model frequency-aware. FM is utilized to model the bi-interactions of different sub-series with low complexity, thus modeling the coupling effects effectively. The RNN structure is fed by the output of the multiple FM layer so that the model becomes sequence-aware and structurally friendly to online prediction.
The experimental study and discussion prove that the proposed mRNFM structure has the following advantages. First, the proposed method makes accurate predictions with low time and space complexity. Second, the proposed method explicitly and effectively models the coupling effects in ship motions, thereby possessing natural interpretability. Third, the proposed method models the interactions between high-frequency and low-frequency parts of the signals. Lastly, the proposed method can directly improve existing RNN models through the multi-stage training trick.
We hope that this work will encourage researchers to explore and improve the proposed method further in the following aspects.
1) Different wavelet functions may have different effects when utilized in the proposed structure; thus, the choice of the wavelet function is worthy of further study.
2) Aside from sRNN, many other neural networks can be applied to mRNFM, which calls for additional experiments.
3) Future studies can also apply the proposed method to other time series tasks, such as EEG and pose recognition.
ACKNOWLEDGMENT This work was supported by China's National Training Program of Innovation and Entrepreneurship for Undergraduates.