Revisiting linear regression of dynamical systems within the context of Zwanzig-Mori theory: tests on a simple system

— Linear regression can be applied to time series data to extract model parameters such as the effective force and friction constant matrices of the system. Even highly nonlinear systems can be analyzed by linear regression, if the total amount of data is broken up into shorter “time windows”, so that the dynamics is considered to be piece-wise linear. Traditionally, linear regression has been performed on the equation of motion itself (which approach we refer to as LRX). There has been surprisingly little published on the accuracy and reliability of LRX as applied to time series data. Here we show that linear regression can also be applied to the time correlation function of the dynamical observables (which approach we refer to as LRC), and that this approach is better justified within the context of statistical physics, namely, Zwanzig-Mori theory. We test LRC against LRX on a simple system of two damped harmonic oscillators driven by Gaussian random noise. We find that LRC allows one to improve the signal to noise ratio in a way that is not possible within LRX. Linear regression using time correlation functions (LRC) thus appears to be not only better justified theoretically, but it is more accurate and more versatile than LRX.


I. INTRODUCTION
First introduced by Legendre and Gauss, least squares regression is a common first step in building a mathematical model of natural phenomena [1]. Least squares regression when applied to dynamical systems usually involves regression applied to the dynamical equation of motion itself. When the model parameters appear linearly in the postulated equation of motion, least squares regression becomes linear regression. Granger causality [2] is an example of linear least squares regression, while dynamic causal modeling [3] is an example of potentially nonlinear least squares regression. Complex systems such as the brain are nonlinear at macroscopic timescales. However, even highly nonlinear systems can be This paragraph of the first footnote will contain the date on which you submitted your paper for review, which is populated by IEEE. considered to be piece-wise linear at shorter timescales, because if the energy surfaces on which the dynamics moves can be assumed to be differentiable, then the effective potential energy can be Taylor-expanded to second order at every point. One can then divide macroscopic timescales into shorter time windows, within each of which dynamics is linear. Such an "instantaneous normal mode" approach has proved useful for analysis of liquid state dynamics [4][5][6].
In applications to experimental time series data, such dynamical analysis can help us not only to better understand the dynamics of these systems, but also how to modify the dynamics through externally applied forces. For example, if by dynamical analysis of electroencephalogram data, we understand the effective macroscopic "forces" at play in triggering and maintaining a "seizure trajectory" in the brain, we may be able to design "counter-forces" to disrupt that trajectory using invasive electrical stimulation of the brain. This kind of analysis applied to brain stimulation can result in intelligent prediction and control algorithms, for example, using Kalman filters [7,8], that would move the field beyond the current empirical trial-and-error standard [9,10]. More broadly, dynamical analysis can be applied to any other time series data, such as functional magnetic resonance imaging or magnetoencephalogram data. It can help identify spontaneous networks underlying both normal and abnormal brain function, which would lead to better understanding of brain function in health and in disease.
Although least squares regression is widely applied to dynamical data, there is surprisingly little published on how well it performs on test systems. Such testing would need to be performed on computer generated trajectories of mathematical models for which the correct model parameters are known. In this study, we investigate how well least squares regression works for nearly the simplest possible dynamical model, that of two simple harmonic oscillators which move under the influence of friction and Gaussian random forces. We find that the conventional approach to least squares regression uniformly fails in its estimate of the effective friction constant for a dynamical system at thermal equilibrium. We suggest an alternative approach to least squares regression, where regression is applied to the equation of motion for the time correlation functions of the experimental observables (rather than on the experimental observables themselves) that may yield more accurate results. The motivation for looking at time correlation functions comes from revisiting the physical justification of least squares regression.

II. THEORY
Least squares regression can be grounded in physical law through Zwanzig-Mori theory [11][12][13], which connects microscopic dynamics to macroscopic observables. We refer to macroscopic observables as the explicit degrees of freedom, and all other degrees of freedom as implicit degrees of freedom. Projection operators then "project out" the implicit degrees of freedom such that one is left with an equation of motion for the explicit degrees of freedom that depends on the implicit degrees of freedom through complicated functions. This equation has a form equivalent to the generalized Langevin equation [14], which can be written: Here ( ) is an N-dimensional time-dependent vector where each element of the vector is an explicit degree of freedom; ( ) = ( ) − 〈 ( )〉 where 〈… 〉 indicates a time average over a particular time window; ̇( ) and ̈( ) are the first and second time derivatives of ( ); [ , , ( )] are the harmonic force constant, friction and memory kernel matrices, respectively; ( ) is a random force term; and the discretized time step is . The memory kernels ( ) are time-lag matrices that determine the effect of prior values of ̇( − ) from m time steps in the past on the current value of the force, ̈( ). We are using mass weighted coordinates so that mass is absorbed into the coordinate.
In the engineering literature, it is more common to postulate an equivalent linear equation: Here ( ) are × time-lag matrices that determine the effect of prior values ( − ) from m time steps in the past on the next future value of the explicit degree of freedom, ( + ), and ′( ) is an N-dimensional error term. The matrices ( ) can be related to the matrices [ , , ( )] by comparing terms in (1) and (2). For example, for = 1, we have: Conventional least squares regression then consists of regarding either ( ) or ′ ( ) as the errors in the equations of motion in (1) and (2), then choosing either the , and ( )'s so as to minimize the sum of the squares of ( ) over some finite time window, or the ( )'s so as to minimize the sum of the squares of ′( ) over the same time window. Let us refer to this (conventional) type of least squares regression as LRX. Solutions for = [ , , ] using LRX are derived in Appendix A.
Within Zwanzig-Mori theory, the random force terms ( ) and, equivalently, ′( ), represent contributions from the implicit degrees of freedom which are not necessarily small. Thus there is no physical justification for the assumption underlying LRX [8]. On the other hand, a defining characteristic of the random force, ( ), is that it is not correlated with ( ) when averaged over phase space [11][12][13]. If we further invoke the ergodic hypothesis [15], which replaces a phase space average with a time average, then one can apply least squares regression to time correlation functions of the explicit degrees of freedom, rather than the explicit degrees of freedom themselves. The error to be minimized is then the square of the time correlation between ( ), which represents signal, and ( ), which represents noise. In effect we are defining noise to be that which is not time-correlated with signal. Let us refer to this type of least squares regression as LRC. Solutions for = [ , , ] using LRC are derived in Appendix B.
Notably, both LRX and LRC are constrained by the ergodic hypothesis; that is, if the system is trapped in a local part of phase space by energy barriers, then one cannot obtain information about any part of phase space that is not being sampled in that time window. In comparing the theoretical justifications of LRX and LRC, the main difference is that LRX additionally requires that the random force be small compared to the forces due to = [ , , ].
In what follows, we drop the memory terms ( ) with the goal of understanding the simpler system first. The memory terms are certain to be important in complex systems such as the brain, but G may be viewed as the zeroth order memory term in (1); it is typically larger than higher order memory terms.
It is useful to monitor the total instantaneous energy and time rate of change of the energy. Within any given time window, the total energy of the explicit degrees of freedom is given by: Here the superscript T indicates a transpose. The time-averaged time-rate of change of the total energy is then: Here indicates a matrix trace and: The first term in (6) involves ̅ , which is an antisymmetric matrix. Force constant matrices that are derivable from a differentiable potential energy are symmetric, in which case ̅ = 0. In the general case for a macroscopic system, we leave open the possibility that ̅ is not necessarily symmetric. The second term in (6) is the time rate of energy loss due to friction, i.e., the dissipation: At steady state, the dissipation term must be balanced by the third term in (6), which is the rate of energy influx due to the random force. The lowest order contribution to the third term requires expansion of the current velocity to first order in the time step, . The velocity at the current time is then: Inserting (10) into (6), then using (1) and keeping only the lowest order terms in , yields the fluctuation-dissipation theorem for the simple Langevin equation [14]:

III. COMPUTER-GENERATED MODEL
We tested LRX vs LRC on computer generated modeling of a system of two damped harmonic oscillators obeying the equation of motion: where: The choice of in (13) (6) is tracked during each run so that we know whether the system has reached steady state. For the values that we tested, a time of 40 seconds is long enough to insure that thermal equilibrium has been reached.
In order to simulate the effect of an external perturbation on the performance of LRX and LRC, at time = 250 sec for each simulation run, we turn on 50 Hz sinusoidal stimulation with amplitude of 1000. This choice of parameters models cortical mapping as performed on patients undergoing epilepsy surgery workup [16,17]. The stimulation is turned on and off in consecutive alternating time windows, and LRX and LRC are applied to time windows when the stimulation is off. An additional 1500 time windows are analyzed in this way, for a total of 15,250 seconds of data analyzed (250 seconds of freerunning data, plus 15,000 seconds of post-stimulation data) for each net simulation run.
For the post-stimulation analysis using LRC, there is an additional option of cumulatively averaging the time correlation functions calculated in each post-stimulation time window. For instance, let ( ), indicate the time correlation function corresponding to the i th time window, and let indicate the number of post-stimulation time windows acquired thus far, where ranges from 1 to 15,000. Then define: An effective K and G are then calculated as in Appendix B, but using ( ; ) rather than the individual ( )'s. Such a cumulative average is not available within LRX.
IV. RESULTS Figure 1 shows the relative errors in LRX and LRC estimates of K vs time with 0 varied over values of 0.01, 1 and 100, and with the true 0 held fixed at 1 Hz. Each point on these graphs represents the relative error of an estimated value for K within one time window, with each time window being 10 seconds in duration. External stimulation is turned on for consecutive alternating time windows starting at a time of 250 seconds. For LRC, a cumulative average of the time correlation function as defined in (15) is calculated, starting with the first poststimulation time window. This cumulatively averaged time correlation function is then used to calculate the LRC estimates of K and G. Figure 1 shows that both LRX and LRC produce reasonably accurate estimates of K in the first time window (t = 10 sec), at thermal equilibrium (t = 40 to 250) and in the post-stimulation time windows (t = 250 to 15,250). However, the LRX estimate for K shows progressive deterioration in the post-stimulation time windows. This effect is due to a heating-up effect of the externally applied stimulation. With repeated applications of the externally applied stimulation, the overall thermal energy increases, which would then be expected to reduce the signal to noise ratio. In contrast, the LRC estimate for K shows an abrupt improvement with the post-stimulation time-windows which is maintained with increasing trials of stimulation. As the root > TNNLS-2021-P-19190 mean square amplitude of the random force, 0 , is increased, the main effect is that larger 0 leads to less accurate estimates of , which is to be expected as larger 0 results greater thermal noise and a smaller signal to noise ratio.   (15) is calculated, starting with the first poststimulation time window. This cumulatively averaged time correlation function is then used to calculate the LRC estimates of K and G. Figure 2 shows that both LRX and LRC produce reasonably accurate estimates of G in the first time window, when the system is not thermally equilibrated, and that both perform poorly after the system is thermally equilibrated (t = 40 to 250), with errors of ~100% at thermal equilibrium. When external stimulation is applied, thus driving the system away from thermal equilibrium, both LRX and LRC improve in their estimates of G, more so when the random noise is small or moderate ( 0 = 0.01 or 1), less well when it is large ( 0 = 100). LRX again shows a heating-up effect where repeated applications of externally applied stimulation result in gradually degraded estimates of G. In contrast, not only is the heating-up effect absent for LRC, but there is a slow gradual improvement in the estimated G as more post-stimulation time windows are accumulated.

Fig. 2.
Top panel: relative error in G using LRX with 0 = 0.01, 1, and 100. Bottom panel: relative error in G using LRC with 0 = 0.01, 1, and 100. Note Log-Log scales for relative error and time. The true 0 is held fixed at 1 Hz. Figure 3 shows the relative errors in LRX and LRC estimates of K vs time with the true 0 varied over values of 0.01, 1 and 100, and with 0 held fixed at 1. Each point on these graphs represents the relative error of an estimated value for K within one time window, with each time window being 10 seconds in duration. External stimulation is turned on for consecutive alternating time windows starting at a time of 250 seconds. For LRC, a cumulative average of the time correlation function as defined in (15) is calculated, starting with the first post-stimulation time window. This cumulatively averaged time correlation function is then used to calculate the LRC estimates of K and G. Figure 3 shows that both LRX and LRC produce reasonably accurate estimates of K in the first time window (t = 10 sec), at thermal equilibrium (t = 40 to 250) and in the post-stimulation time windows (t = 250 to 15,250), as long as 0 is not too large. At large 0 , the signal is damped too fast, thus reducing the number of oscillatory cycles the signal exhibits before the oscillation becomes to small to be detected. In addition, the LRX estimate for K shows progressive deterioration in the poststimulation time windows. This effect is due to a heating-up effect of the externally applied stimulation. With repeated applications of the externally applied stimulation, the overall thermal energy increases, which then reduces the signal to noise ratio. In contrast, the LRC estimate for K shows an abrupt improvement with the post-stimulation time-windows which is maintained with increasing trials of stimulation.   (15) is calculated, starting with the first post-stimulation time window. This cumulatively averaged time correlation function is then used to calculate the LRC estimates of K and G. Figure 4 shows that both LRX and LRC produce reasonably accurate estimates of G in the first time window, when the system is not thermally equilibrated, and that both perform poorly after the system is thermally equilibrated (t = 40 to 250), with errors of ~100% at thermal equilibrium. When external stimulation is applied, thus driving the system away from thermal equilibrium, both LRX and LRC improve in their estimates of G. LRX again shows a heating-up effect where repeated applications of externally applied stimulation result in gradually degraded estimates of G. In contrast, the heating-up effect is absent for LRC; for moderate values of the friction ( 0 =1), there is a slow gradual improvement in the estimated G as more post-stimulation time windows are accumulated. Note that relative errors for 0 are largest both for very small and very large values. For small values of the true 0 , an accurate estimate is more challenging because the relative error is taken with respect to the true value of 0 , so that a small absolute error translates to a large relative error. For large values of 0 , an accurate estimate is challenging because the signal damps out faster, leaving a shorter time duration of the damping dynamics to analyze. It is instructive to look at the relative errors using (1) vs (2). If one uses (2) to fit time series data, it is possible to be fooled into thinking one has an accurate solution if one calculates only the relative errors in the estimates for (0) and (1). To show this, consider the case of two uncoupled harmonic oscillators with frequencies of 5 and 222 Hz, with 0 = 1 Hz, and 0 = 0.005. The relative errors for (0) and (1) are very small whether extracted using either LRX or LRC (see Table 1). However, if these values are converted to the corresponding K's and G's, one would see that the friction constant extracted using LRX is off by 13.98%. The reason that estimates for the relative errors in A(0) and A(1) can be so misleading is because A(0) and A(1) are to lowest order in the timestep, , not dependent on either K or G (see (3) and (4)). In additional efforts to improve the estimates of K and G when the system is at steady state, we tried using both longer but fewer time windows, and shorter and more time windows, such that the total time remained the same. We found that neither longer nor shorter time windows improved the estimates. We also tried accumulating individual K's and G's, each one extracted from individual time windows, then averaging these K's and G's across time windows. This procedure also failed to improve the estimates. That these two procedures failed to improve the estimated K's and G's is not surprising, given that neither procedure improves the signal to noise ratio. For instance, a longer time window will include both more signal and more noise.
We additionally tested other values of the K and G matrices, including a wider frequency gap and both diagonal and nondiagonal versions. Results were concordant with those presented above.

V. DISCUSSION
For the highly simplified dynamical system consisting of two coupled, damped harmonic oscillators which start from a nonequilibrium initial condition and is then allowed to evolve under the influence of a Gaussian random force, both the standard approach to linear regression of dynamical systems (LRX) and the time correlation function approach proposed here (LRC) perform well in extracting an effective force constant matrix (K), but both perform poorly in extracting an effective friction constant (G), if the system is allowed to come to thermal equilibrium. The reason that friction constant estimates are poor at thermal equilibrium is because a defining feature of steady state dynamics is that energy flow into a system (largely due to random noise forces) must on average be balanced by the energy flow out of the system (largely due to friction). This equality is the basis of the fluctuation-dissipation theorem [14]. In this case, steady state is reached when the magnitudes of the random and frictional forces are equal, which makes it difficult to separate random from frictional forces. With this insight, it becomes obvious that estimates of the friction constant can be improved by perturbing the system away from thermal equilibrium, for example, by applying an external perturbation, or by averaging the equilibrium time correlation function across multiple time windows, as in (15). Averaging the time correlation function increases the signal to noise ratio in the time correlation function and improves LRC estimates of the friction constant. An analogous averaging process is not available within the LRX framework, as averaging the explicit degrees of freedom themselves merely results in constants which contain no dynamical information.
A caveat for repeat applications of an external stimulation is that over time, the system can "heat up," and increase the amount of background thermal noise. This increased thermal noise degrades estimates of both the force constant and friction matrices within the LRX framework but not within the LRC framework. A caveat for averaging the time correlation function across multiple time windows within the LRC framework is that one must insure that the system is in the same "state" within each time window analyzed. For example, if the system is highly nonlinear (as the brain is), then it may enter a different state at certain times, each with its own best force constant and friction matrices. The pairing of a standardized protocol for application of an external stimulus followed by post-stimulus sampling may help insure that the system under study is in the same state for each time window sampled.
The effective friction is an important model parameter for all real-world applications. In applications to neural systems, for instance to the detection and monitoring of neural activity via electroencephalograms, magnetoencephalograms or functional magnetic resonance imaging, we hypothesize that (1) the force constant matrix (K) represent direct interactions between principal neuronal groups, the eigenstates of which represent neural patterns of activation by which information or neural codes may be stored; (2) the friction matrix (G) represent inhibitory interactions that act as a brake on the activity of principal neurons; and (3) the random forces R represent energy influx that activates the neural patterns, and provides a balance to the dissipative forces of the friction matrix. One may hypothesize, for instance, that frictional forces are abnormally low just before onset of an epileptic seizure [18]. Thus it is important to have a more reliable estimate of the effective friction constant.
For dynamical systems that are intrinsically nonlinear, it may not always be possible to choose a short enough time window such that the dynamics becomes piece-wise linear. In this case, one may still be guided by the principle that the random force term is best defined as that which is not timecorrelated with the experimental observables, and then the model parameters are again chosen variationally so as to minimize the mean square of the time correlation of the random force with the experimental observables; that is, one can variationally minimize the term in (B5) even when the equation of motion is nonlinear. The minimization process may become much more difficult for nonlinear model systems.

VI. CONCLUSION
Our simulations suggest that linear regression performed on time correlation functions of experimental variables (LRC) is a useful alternative to more traditional methods applying linear regression to the experimental variables themselves (LRX). In particular, we find that estimates of an effective friction constant from dynamical data are unreliable unless the experimental system is either perturbed away from equilibrium using either LRX or LRC, and/or if the estimate is derived from averaging the time correlation function of the dynamic variables across multiple time windows using LRC. If one employs an empirical dynamical model such as (2), as is common in the engineering literature, it is possible to be fooled into thinking one's estimates of model parameters are accurate, > TNNLS-2021-P-19190 when in fact the corresponding estimated friction constant has a relative error on the order of 100%.
We regard our results as exploratory. Future studies may address the best ways of perturbing the model system away from equilibrium so as to obtain more reliable estimates of the force constant and friction constant matrices. Studies of more complex systems [19][20][21] and incorporation of memory effects (the H-terms in (1)) are also needed. In preliminary studies, we found that introducing H-terms can corrupt estimates for the force constant matrix (K) and the friction constant matrix (G). For instance, trajectories generated by simple damped harmonic oscillators (without memory, namely, where all the H-terms are zero) can result in LRX and LRC estimates where the estimated H-terms were dominant rather than being zero, with corresponding degradation of the estimated K's and G's. Similar results were found in by Darve et al. [22]. L2 regularization [23] failed to remove this flaw. We suspect that further physically-based constraints need to be imposed on the H-terms, for instance, that higher order terms in the H series are damped in time by an exponential envelope.
Finally, in the context of machine learning and artificial intelligence approaches to the analysis of dynamical systems, the LRC approach can be regarded as complementary, in that time averaging of time correlation functions systematically improves signal-to-noise ratio, and the time-averaged time correlation functions themselves can be regarded as inputs to machine learning and artificial intelligence algorithms. Here the superscript signifies a matrix transpose, 〈… 〉 signifies a time average over this time window, and signifies a matrix trace. The optimal that minimizes the total error is then given by: Taking the second time derivative of (B2) and inserting (B1) yields: where ( ) is given by: Within Zwanzig-Mori theory, the correlation in phase space of the random force with the explicit degrees of freedom is zero [16][17][18]. If we additionally invoke the ergodic hypothesis and replace the phase space correlation with a time correlation, then