Statistical-mechanical analysis of adaptive filter with clipping saturation-type nonlinearity

In most practical adaptive signal processing systems, e.g., active noise control, active vibration control, and acoustic echo cancellation, substantial nonlinearities that cannot be neglected exist. In this paper, we analyze the behaviors of an adaptive system in which the output of the adaptive filter has the clipping saturation-type nonlinearity by a statistical-mechanical method. We discuss the dynamical and steady-state behaviors of the adaptive system by asymptotic analysis, steady-state analysis, and numerical calculation. As a result, it has become clear that the saturation value has the critical point at which the system's mean-square stability and instability switch. The obtained theory well explains the strange behaviors around the critical point observed in the computer simulation. Finally, the exact value of the critical point is also derived.


I. INTRODUCTION
A DAPTIVE signal processing is used in a wide range of areas such as communication systems and acoustic systems [1], [2]. Active noise control (ANC) [3]- [6], active vibration control (AVC) [7], acoustic echo cancellation [8], and system identification [9] are examples of specific applications of adaptive signal processing. Adaptive signal processing using linear filters has been theoretically analyzed [1], [2].
The components of most practical adaptive systems, such as power amplifiers and transducers such as loudspeakers and microphones, have substantial nonlinearities that cannot be neglected [1], [2]. Such nonlinearities are inevitable, and it is extremely important to investigate in detail their effects on the overall performance of adaptive systems. Therefore, there have been many studies on adaptive signal processing systems including nonlinear components [10]- [31]. In some of these studies, nonlinearities where an input signal and an error signal are expressed by their signs (±1) or three values (−1, 0, +1) have been investigated [10]- [20]. Note, however, that such nonlinearities are intended to reduce computational complexity. Bershad [21] analyzed the case in which the update by the least-mean-square (LMS) algorithm [32] has (1 − e −ax ) saturation-type nonlinearity, assuming a small step size. Costa et al. [22] analyzed the case in which the output of the adaptive filter has an error function (erf) saturation-type nonlinearity, assuming a small step size. Costa et al. [23], [24] analyzed Manuscript received April 19,202x; revised December 11,202x. This work was supported by JSPS KAKENHI Grant Number JP20K04494.
ANC in which the secondary path has an erf saturation-type nonlinearity. Snyder and Tanaka [25] proposed the replacement of the finite-duration impulse response (FIR) filter with a neural network to deal with the primary path nonlinearity in ANC/AVC. Costa [26] analyzed a hearing aid feedback canceller with an erf saturation-type nonlinearity. Costa et al. [27] analyzed a model in which the output of the adaptive filter has a dead-zone nonlinearity caused by a class B amplifier or a nonlinear actuator, assuming a small step size. Tobias and Seara [28] analyzed the behaviors of the modified LMS algorithm derived from the improved cost function in the case where the output of the adaptive filter has an erf saturation-type nonlinearity. Bershad [29] analyzed the case where the update by the LMS algorithm has an erf saturation-type nonlinearity and extended the analysis to the tracking of a Markov channel in the context of system identification. As described so far, there have been many studies on adaptive systems with an erf saturation-type nonlinearity. On the other hand, Hamidi et al. [30] reported their analysis, computer simulation, and experimental results of an ANC model in which the output of the adaptive filter has the clipping saturation-type nonlinearity. They proposed a modification of the cost function to avoid using a nonlinear region to improve the adaptive algorithm. Stenger and Kellermann [31] proposed the use of clippingtype preprocessing in adaptive echo cancellation to cancel the effect of nonlinear echo paths.
On the other hand, our group has recently studied the application of the statistical-mechanical method [33] to the analysis of adaptive signal processing. In traditional statistical analysis, it is generally necessary to use a number of approximations and assumptions to compute expectations with respect to the input signal, which is a random variable. On the other hand, in statistical-mechanical analysis, by considering the largesystem limit, the universal properties of a system consisting of many microscopic variables can be simply discussed macroscopically and deterministically in terms of a small number of macroscopic variables. In addition, since the law of large numbers and the central limit theorem hold, many calculations required in the analysis become easy to perform. Statisticalmechanical analysis is particularly suitable for the analysis of signal processing that involves an adaptive filter with a large tap length as is common in practical acoustic systems. Note that we are not implying that statistical-mechanical analysis is superior to traditional statistical analysis since the largesystem-limit assumption can be a weakness in some cases. Our group [34], [35] has also analyzed feed-forward ANC updated by the Filtered-X LMS (FXLMS) algorithm using the statistical-mechanical method. However, the analyses dealt with the case where the primary path, secondary path, and adaptive filter were all linear. Our group [36] has also analyzed a model in which both the unknown system and adaptive filter have the Volterra-type nonlinearity [37] as an application of the statistical-mechanical method to nonlinear adaptive signal processing. Although Volterra filters have nonlinear characteristics, it was relatively easy to apply the statisticalmechanical method used for linear systems to their analysis. However, the method was only applicable to Volterra filters of a specific order. Moreover, in the previous study, we did not deal with simple nonlinearities such as saturation and deadzone types, which are often found in actual adaptive processing systems.
As described above, there have been many studies on adaptive signal processing with nonlinear components. The erf-type saturation function has been well analyzed in previous studies. However, the clipping saturation type, i.e., the piecewise linear type, is an important alternative expression for the saturation phenomena of components constituting adaptive systems such as power amplifiers and transducers such as loudspeakers and microphones. As evidence of this, in their paper [27] on the analysis of the dead-zone type rather than the saturation type, Costa et al. stated that to facilitate the development of analytical models, it is convenient to approximate the piecewise nonlinearity by a continuous and more mathematically tractable function. Moreover, they [27] compared the results of the erf-type analysis with the results of computer simulations performed with piecewise linearity. However, there have been few studies on clipping saturation-type nonlinearity; in particular, there have been no analytical studies to the best of our knowledge.
In light of these developments, in this paper, we analyze the behaviors of a system with an adaptive filter whose output has clipping saturation-type nonlinearity, i.e., piecewise linearity. The main contributions of this paper are as follows: • We analyze the dynamical and steady-state behaviors of an adaptive system in which the output of the adaptive filter has the clipping saturation-type nonlinearity, which is an alternative expression for saturation phenomena other than erf-type saturation, by the statistical-mechanical method. • To this end, we introduce two macroscopic variables to represent the macroscopic state of the system and derive the simultaneous differential equations that describe system behaviors in a deterministic and closed form under the long-filter assumption. Although the derived equations cannot be analytically solved, we discuss the dynamical behaviors and steady state of the adaptive system by performing numerical calculation, asymptotic analysis, and steady-state analysis. In the analysis, we do not assume that the step size is small. • It is clarified that the saturation value S has a critical value S C at which the system's mean-square stability and instability switch. That is, when S > S C , both the meansquare error (MSE) and mean-square deviation (MSD) converge, i.e., the adaptive system is mean-square stable. On the other hand, when S < S C , the MSD diverges, i.e., the adaptive system is not mean-square stable. However, even when S < S C , the MSE converges. The converged value is a quadratic function of S and does not depend on the step size. Finally, an exact expression for S C is derived by asymptotic analysis. • The proposed theory not only replaces the erf-type nonlinearity analyzed in [22] with a clipping saturation-type nonlinearity, but also be generalized to large step sizes and to the case where the nonlinearity is so strong that the mean-square stability of the system does not hold. The rest of this paper is organized as follows. In Sec. II, we define the model analyzed in this study. In Sec. III, we describe the statistical-mechanical analysis for the model in detail. In Sec. IV, we demonstrate the validity of the obtained theory by comparing theoretical results with simulation results. We also clarify that there exists a critical value S C for the saturation value S at which the properties of the adaptive system switch markedly. In addition, some results obtained by steady-state, asymptotic, and numerical analyses are shown. Furthermore, we obtain the exact value of S C by asymptotic analysis. In Sec. V we conclude this paper.
Notation: Scalars are denoted by lowercase fonts. Exceptionally, Q, S, S C , N , and M are also scalars in accordance with the conventions used in the corresponding literature. Column vectors are denoted by bold lowercase and matrices by bold uppercase fonts. Superscripts and −1 denote transpose and inverse, respectively while · stands for expectation. Finally, if x is a vector, then II. MODEL Figure 1 shows a block diagram of the adaptive system analyzed in this paper. The impulse response of the unknown system G is an M -dimensional arbitrary vector and is time-invariant. The adaptive filter W is an N -tap finite- duration impulse response (FIR) filter. Its coefficient vector is where n denotes the time step. Although the dimension M of g 0 is generally unknown in advance, we assume that the tap length N of the adaptive filter W is set to satisfy because it is straightforward to design an adaptive filter W of tap length N with a margin. Also, let g be a vector made into N dimensions by adding N − M zeros to g 0 . That is, Note that while it is assumed in most previous studies that the dimensions of g 0 and w are the same [22]- [24], [27]- [29], our model essentially allows an arbitrary g 0 and does not make strict assumptions on its dimension M or its elements We define the parameter σ 2 g as As will become clear later, our theory depends on the unknown system G only via σ 2 g . To demonstrate this, we will show in Sec. IV that the theory is valid for actual g 0 obtained experimentally.
The input signal u(n) is assumed to be independently drawn from a distribution with That is, the input signal is white. Although the assumption that the input signal is white may seem to be a major constraint in this analysis, the white input model is an important part of practical applications, especially in system identification, and provides a clear insight into the behavior of the algorithm. Moreover, the white input case provides a baseline for other cases [22]. The tap input vector u(n) at time step n is Note that only the mean and variance of the distribution are specified in (7). No specific distributions, for example, the Gaussian distribution, are assumed. Outputs of the unknown system G and the adaptive filter W are convolutions of their own coefficients and a sequence of input signals. That is, the outputs d(n) of G and y(n) of W are respectively The nonlinearity of the adaptive filter W is modeled by the function f placed after W. The function f represents the clipping saturation-type nonlinearity and is expressed as where S is a nonnegative real number. Figure 2 shows the function f . The error signal e(n) is generated by adding an independent background noise ξ(n) to the difference between d(n) and y(n). That is, Here, the mean and variance of ξ(n) are zero and σ 2 ξ , respectively. Note that only the mean and variance of the distribution are specified. No specific distributions, for example, the Gaussian distribution, are assumed for the background noise.
The LMS algorithm [32] is used to update the adaptive filter. That is, where µ is a positive real number and is called the step size.

III. ANALYSIS
In this section, we theoretically analyze the behaviors of the adaptive system with clipping saturation-type nonlinearity by the statistical-mechanical method. From (12), the MSE is expressed as In this section, we omit the time step n unless otherwise stated to avoid a rather cumbersome notation. We assume N → ∞ 1 while keeping both σ 2 g and constant, in accordance with the statistical-mechanical method [33]. The normalized LMS (NLMS) algorithm [1], [2] is a practically important variant of the LMS algorithm, whose update rule is whereμ is the step size. Note that since u(n) 2 2 = N σ 2 = ρ 2 , the analysis in this paper is equivalent to the analysis of the NLMS algorithm withμ = ρ 2 µ as the step size for a stationary input signal u(n). Then, from the central limit theorem, both d and y are stochastic variables that obey the Gaussian distribution. Their means are zero, and their variance-covariance matrix is Here, Q and r are macroscopic variables that are respectively defined as The derivation of the means and variance-covariance matrix is given in detail in Appendix A.
We obtain three sample means in (15) as follows by carrying out the Gaussian integration for d and y: where erf(·) is an error function defined as Equation (22) is easily derived from (19). Equations (23) and (24) are derived in detail in Appendices B and C, respectively. From (15), (22), (23), and (24), we obtain the MSE as This formula shows that the MSE is a function of the macroscopic variables Q and r. Therefore, we derive differential equations that describe the dynamical behaviors of these variables in the following. Multiplying both sides of (13) on the left by g and using (21), we obtain We introduce time t defined by and use it to represent the adaptive process. Then, t becomes a continuous variable since the limit N → ∞ is considered. These calculations are in line with the statistical-mechanical analysis of online learning [38].
N r(n + N dt) = N r(n + N dt − 1) Summing all these equations, we obtain Therefore, we obtain Here, from the law of large numbers, we have represented the effect of the probabilistic variables by their means since the updates are executed N dt times, that is, many times, to change r by dr. This property is called self-averaging in statistical mechanics [33]. From (12) and (33), we obtain a differential equation that describes the dynamical behavior of r in a deterministic form as follows: Next, squaring both sides of (13) and proceeding in the same manner as for the derivation of the above differential equation for r, we can derive a differential equation for Q, which is given by Here, we used the fact that in the limit of N → ∞, u u = u 2 2 = N i=1 u(n − i + 1) 2 is no longer a random variable but a constant ρ 2 = N σ 2 from the law of large numbers, so this can be taken out of the expectation operation · . This is precisely one of the strengths of the statistical-mechanical method, which assumes N → ∞.
Equations (34) and (35) include five sample means. However, because three of the five means are already given in (22)-(24), we obtain the two remaining means as follows by carrying out the Gaussian integration for d and y: Equation (36) is easily derived from (19). Equation (37) is derived in detail in Appendix D.
Substituting (22)- (24), (36), and (37) into (34) and (35), we obtain the concrete formulas of the simultaneous differential equations as follows: We numerically solve the derived simultaneous differential equations, since they cannot be analytically solved. Substituting the obtained numerical solution into (26), we obtain the MSE learning curves.
From (6), (20), and (21), we can also obtain the MSD, or misalignment, as a function of the macroscopic variables Q and r as follows: Equation (42) shows that the MSD is proportional to the tap length N in the model setting in this paper; therefore, we normalize the MSD by N and call it the normalized MSD.

A. Learning curves
We first investigate the validity of the theory by comparing theoretical results with simulation results with regard to the dynamical behaviors of the MSE and normalized MSD, that is, the learning curves. Figures 3 and 4 show the learning curves obtained using the theory derived in the previous section, along with the corresponding simulation results. In these figures, the curves represent theoretical results and the polygonal lines represent simulation results. In the theoretical calculation and simulations throughout this section, ρ 2 = σ 2 g = 1. In the theoretical calculation, the results are obtained by substituting r and Q, which are respectively obtained by solving (38) and (39), into (26) and (42). Here, (38) and (39) are numerically solved by the Runge-Kutta method. In the computer simulations, the number of taps of the adaptive filter W is N = 400 and ensemble means for 1000 trials are plotted. The impulse response g 0 of the unknown system G in all computer simulations in this paper is obtained experimentally by measurement [39] and is shown in Fig. 5. Its dimension M is 256. Note that g 0 has been normalized to satisfy (6). All initial values w i (0), i = 1, . . . , N of the coefficients are set to zero in the simulation, and the initial condition r(0) = Q(0) = 0 is used in the theoretical calculation. Figures  3 and 4 show that the theory derived in this paper predicts the simulation results well in terms of average values.
Since statistical-mechanical analysis assumes N → ∞, it is important to investigate how well the theory explains the simulation results with small N . Figure 6 shows the theoretical learning curves along with the simulation results for N = 400, 100, 20, and 5. In the computer simulations,   the unknown system coefficient vectors are generated by uniformly sampling the impulse responses shown in Fig. 5 and normalizing them to satisfy (6). Figure 6 shows that, especially when S and µ are large, there is significant disagreement between the simulation results with small N and the theoretical results. The reason for this disagreement is considered to be that the theory is derived using the long-filter assumption, that is, N → ∞, whereas the simulations are executed using finite values of N . The dependence of the disagreement on N is an example of the phenomenon known as the finite-size effect in statistical mechanics. Note, however, that when S or µ is small, the theory is in good agreement with the simulation, even if N = 20 or 5. Figure 7 shows examples of the MSE learning curves when the initial values w i (0), i = 1, . . . , N of the filter coefficients are not zero. In the computer simulation, the initial values of the filter coefficients are independently drawn from a Gaussian distribution with a mean 0 and a variance 1. In response to this, the initial conditions r(0) = 0 and Q(0) = 1 are used in the theoretical calculation. Figures 7a-7d show that the theory derived in this paper predicts the simulation results well even if the initial values of the filter coefficients are not zero. Comparing these figures with Figs. 3a-3d, it can be seen that the initial values of the MSE are increased by the nonzero initial values of the filter coefficients. On the other hand, their steady-state values remain the same. Hereafter, the initial values of the filter coefficients are assumed to be zero.
From Figs. 3a-3d, it seems that the MSE almost converges at t = 50 regardless of the step size µ for both S = 1 and 3. However, Figs. 4a and 4c show that the normalized MSD continues to increase for S = 1. Next, we show the MSE at t = 10, 100, and 1000 in Figs. 8 and 9 to investigate the relationship between the saturation value S and the MSE. For the computer simulations, the medians and standard deviations in 100 trials are plotted using error bars. Figures 8 and 9 show that the MSE increases when S is in the range of 1.1-1.3, and this tendency becomes increasingly pronounced with time.

B. Critical value S C and steady-state analysis when S > S C
To clarify the phenomenon of increasing MSE described in the previous subsection, we investigate the steady-state values of the macroscopic variables r, Q, and MSE. If steadystate values of r and Q exist, they can be obtained by numerically solving the simultaneous equations that are obtained by substituting zeros into the left-hand sides of the simultaneous differential equations (38) and (39). Figure 10 shows the numerically obtained results for Q along with the corresponding simulation results at t = 10 4 . This value of t in the simulation is sufficient for Q to reach a steady-state value. When the saturation value S is larger than S C = 1.25331 · · · , a numerical solution is found. However, when S is smaller than S C , no numerical solution is found. Of course, no nu-  merical solution found for the simultaneous equations is only a necessary condition for r and Q to diverge, not a sufficient condition. In other words, no numerical solution found does not necessarily indicate that r or Q diverges. Of course, it is most desirable to prove mathematically that r and/or Q diverge, but this is difficult because of the complexity of the simultaneous differential equations (38) and (39). Therefore, we investigate the dynamical behavior of Q for S < S C by means of theoretical numerical calculations and computer simulations. Figures 11a and 11b show the results. In these figures, Q increases monotonically in all cases. These results strongly indicate that Q diverges regardless of the step size and the presence of background noise when S < S C . Since Q is proportional to the square of the 2 -norm of w, as seen from (20), the divergence of Q means the divergence of the coefficient vector w of the adaptive filter W. When S > S C , we can obtain the steady-state MSE by substituting the steadystate values of r and Q into (26). Note that it is clarified in Sec. IV-D that the exact value of S C = 1.25331 · · · is π 2 .

C. Asymptotic analysis when S < S C
On the basis of the results obtained in the previous subsection, we henceforth assume that Q diverges in the limit t → ∞ when S < S C . However, Figs. 8 and 9 show that the MSE appears to converge even when S < S C . Therefore, an asymptotic analysis of the behavior of the system when S < S C is given in this subsection. From (6), (20), and (21), we obtain Here, θ is the angle between vectors g and w. From (38), (39), and (43), we obtain are used. Equation (44) means that the directions of g and w coincide even when S < S C , although w diverges as described in Sec. IV-B. Note that this property does not depend on µ, ρ 2 , σ 2 g , or σ 2 ξ . As described so far, Q diverges and cos θ = 1 in the limit t → ∞ when S < S C . Then, the MSE is Equation (47) is derived in detail in Appendix F. Although w diverges when S < S C , (47) shows that the MSE converges. The converged value does not depend on the step size µ, and it is a quadratic function of S. The converged value is minimum, σ 2 g ρ 2 1 − 2 π + σ 2 ξ , when S = σ g ρ 2 π . Figure  12 shows the theoretically obtained steady-state values of cos θ = r/(σ g √ Q), along with the corresponding simulation results at t = 10 4 . This value of t in the simulation is sufficient for cos θ to reach a steady-state value. In this figure, for S > S C , theoretically obtained values calculated using the steady-state values of r and Q given in Sec. IV-B are plotted. On the other hand, for S < S C , the theoretically obtained value obtained using (44) is plotted. Figure 13 shows the theoretically obtained steady-state values of the MSE, along with the corresponding simulation results at t = 10 4 . This value of t in the simulation is sufficient for the MSE to reach a steady-state value. In this figure, for S > S C , the theoretically obtained results of the steady-state analysis described in Sec. IV-B are plotted. On the other hand, for S < S C , the theoretical result obtained using (47) is plotted. As Fig. 13 and (47) show, for S < S C , the steady-state MSE is independent of the step size µ. On the other hand, for S > S C , the steady-state MSE depends on µ, but the MSE never diverges because |f ((y(n))| ≤ S. From the above, there is no upper bound on µ for the MSE to converge. Figure 14 shows the theoretically obtained steady-state normalized MSD, along with the corresponding simulation results at t = 10 4 . This value of t in the simulation is sufficient for the normalized MSD to reach a steady-state value. This figure shows that the normalized MSD diverges because Q diverges in the limit S ↓ S C , where S ↓ S C denotes that S decreases in value approaching S C . That is, the condition in which the adaptive system is mean-square stable [2] is S > S C . Although not mathematically proven, Fig. 14  suggests that for S > S C , the normalized MSD converges regardless of the step size µ.
D. Exact derivation of the critical value S C As described so far, the properties of the adaptive system switch markedly at S = S C . That is, when S > S C , the MSE and normalized MSD converge. In other words, the adaptive system is mean-square stable. The converged values depend on the step size µ. On the other hand, when S < S C , the normalized MSD diverges and the MSE converges to a value that does not depend on µ. In addition, the angle between the coefficient vector g of the unknown system G and the coefficient vector w of the adaptive filter W converges to zero.
In this subsection, we analytically obtain the critical value S C . As described in Sec. IV-B, As described in Secs. IV-B and IV-C, Substituting (45), (46), and (48)-(50) into (38) and (39) and solving for S C , we obtain Equation (51) shows that the critical value S C is proportional to σ g and ρ defined by (6) and (16), respectively. In addition, we see that the critical value of S C = 1.25331 · · · numerically obtained in Sec. IV-B is, in fact, π 2 .

V. CONCLUSIONS
We analyzed the behaviors of an adaptive system in which the output of the adaptive filter has the clipping saturation-type nonlinearity by the statistical-mechanical method. As a result, it was clarified that the saturation value S has a critical value S C at which the system's mean-square stability and instability switch. Finally, S C was exactly derived by asymptotic analysis. Although the theory by Costa et al. [22] and the theory proposed in this study take completely different approaches in terms of assumptions and methods, they both reveal the existence of critical values for nonlinearity, which is extremely interesting. That is, η 2 = 1 is the condition for establishing the "power threshold" in [22], and S = S C = σ g ρ π 2 is the critical condition in our theory. However, our theory is valid for arbitrary step sizes, whereas Costa et al. [22] assume small step sizes. Our theory also well explains the simulation results in the case of strong nonlinearity S < σ g ρ π 2 , which corresponds to η 2 > 1 in [22]. The findings in this study are non-trivial, albeit with respect to classical and simple models, and are significant both theoretically and practically.
Interesting theoretical problems that should be studied further remain. Analyses of other types of nonlinearities should be performed. For example, a dead-zone-type nonlinearity expressed by a piecewise linearity can be analyzed by the method described in this paper. In addition, the nonlinearities of other components should be analyzed in future studies.