Skew Filtering for Online State Estimation and Control

Process control can become challenging when the measurements are affected by irregular noise. Classical approaches utilize Gaussian methods to alleviate the sensory noise. However, many industries involve skewed noise in their processes. While the closed skew-normal (CSN) distribution generalizes a Gaussian distribution with additional parameters, its dimension increases during recursive estimation, making it impractical. Even though there are some techniques for the solution, they are typically too complicated or inaccurate for higher-dimensional problems. This study proposes a novel online optimization scheme to reduce the dimensionality of a CSN distribution while considering the properties of the complete empirical distribution. Since the objective function used during the optimization step considers the geometry of the metric space, the proposed scheme achieves higher accuracy without sacrificing computational efficiency. The proposed filter is applied to two pilot-scale experiments. The results indicate that it is beneficial for recursive state estimation in the presence of skewed noise.


Skew Filtering for Online State Estimation and Control
Oguzhan Dogru , Ranjith Chiplunkar , and Biao Huang , Fellow, IEEE Abstract-Process control can become challenging when the measurements are affected by irregular noise.Classical approaches utilize Gaussian methods to alleviate the sensory noise.However, many industries involve skewed noise in their processes.While the closed skewnormal (CSN) distribution generalizes a Gaussian distribution with additional parameters, its dimension increases during recursive estimation, making it impractical.Even though there are some techniques for the solution, they are typically too complicated or inaccurate for higherdimensional problems.This study proposes a novel online optimization scheme to reduce the dimensionality of a CSN distribution while considering the properties of the complete empirical distribution.Since the objective function used during the optimization step considers the geometry of the metric space, the proposed scheme achieves higher accuracy without sacrificing computational efficiency.The proposed filter is applied to two pilot-scale experiments.
The results indicate that it is beneficial for recursive state estimation in the presence of skewed noise.

I. INTRODUCTION
I NDUSTRIAL processes require safe, economically feasible operation while providing optimal production.To satisfy such requirements, sensory measurements are utilized in monitoring and control applications.However, physical limitations often affect these measurements, which reduces the measurement quality.Various techniques have been developed to quantify and mitigate the noise such that the operational requirements are met efficiently and effectively.A common assumption in such techniques is that the state of interest and the sensory measurements exhibit Gaussian properties.
In the presence of Gaussian noise in both state and measurements, one can obtain an optimal estimation by using a Bayesian filter, or particularly a Kalman filter [1], [2], [3], [4].Because the Kalman filter consists of simple linear operations, the algorithm can be applied in both online and offline settings, which is favorable in various applications.Although assuming Gaussian behavior provides effective solutions while estimating the variables of interest, the noise affecting state and measurements may not be completely Gaussian.In this case, more complex filters with additional assumptions and modifications may be needed.
This study focuses on a type of continuous probability distributions, where a Gaussian distribution is generalized with nonzero skewness.This property can be found in variables that range from truncated Gaussian to normal distributions.Some examples of variables with such properties can be found in the domains of actuarial science [8], aerospace engineering [9], biology [10], chemical engineering [11], [12], climatology [13], communication/electronics [14], defense [15], earth sciences [16], economics/finance [17], mathematics [18], process systems and control [19], and statistics [20].In the presence of noisy measurements, Bayesian estimation with distributions like skew-t [21] or Weibull can provide optimal estimations for these variables.Despite that these distributions can explain the skewness of the variables, they may not be sufficient to recover the actual distribution due to their low degree of freedom, or they may not have a closed-form solution in the estimation scheme, which increases the complexity and computational burden [22].
On the contrary, a closed-skew normal (CSN) distribution is a function with five parameters that generalizes the normal distribution.This generalization accounts for nonzero skewness as well as different location and scale information that can further adjust the shape of the distribution [23].The CSN distribution is closed under the linear transformation of CSN variables, marginalization, and Bayesian inversion [24].These properties are vital in developing recursive filtering schemes for linear dynamic systems characterized by skewed noises.Rezaie and Eidsvik [9] provide an overview of the filtering schemes for linear systems with CSN noise.In addition, several methods, such as ensemble filter [9], unscented Kalman filter [18], etc., have been developed for nonlinear systems with CSN noise.Although a CSN filter can provide effective estimations, given the noisy observations, it suffers from increasing dimensionality during recursive Bayesian estimation, making the filter impractical for online implementations.Despite several approaches that have been proposed to solve the dimensionality problem, they either introduce the skewness indirectly in the initial state [25]  or consider partial information about the distribution of interest [11], which may yield inaccurate approximation.
This study, on the other hand, proposes a numerical optimization technique to overcome the increasing dimensionality problem without loss of generality.Since the proposed technique takes the entire CSN distribution into account at every time step, it provides optimal approximations to the actual high-dimensional distribution.In addition, to online state estimation [26], the filter can also be used to develop advanced controllers such as reinforcement learning (RL) agents or model predictive controllers (MPC).The contributions of this study are as follows: 1) developing a dimensionality reduction technique for online skew state estimation; 2) theoretically and empirically comparing different objective functions and optimization tools for the proposed method; 3) utilizing the proposed filtering scheme in online RL and MPC applications; 4) quantitatively analyzing the proposed scheme by considering both filtering and control through two pilot-scale experiments.The rest of this article is organized as follows, detailed background information is provided in Section II, the proposed method is described in Section III, the results are discussed in Section IV, and future research direction is provided in Section V. Finally, Section VI concludes this article.

II. BACKGROUND
A process can be controlled by using different techniques such as learning-based or model-based controllers if the state of interest is available at all time steps.Since the practical control or monitoring schemes rely on noisy measurements, an accurate state estimation scheme is required to achieve satisfactory control performance.This study develops an optimal state estimator in the presence of skew measurement noise and applies it to real-time state estimation and control problems.Table I tabulates the notation of this article.

A. Kalman Filtering
Physical processes may be represented by a discrete-time state-space model as shown in (1) and (2). (1) where

B. Closed-Skew Normal Distribution
The multivariate CSN generalizes the Gaussian distribution by introducing skewness [27] into Gaussian models and was introduced in [23].Consider two random vectors, τ ∈ R n 2 and ξ ∈ R q 2 .Joint probability density function (pdf) of τ and ξ can be written as follows: where μ τ and μ ξ are location vectors, and Σ τ , Γ ξτ , and Σ ξ are covariance matrices.Then, the CSN variable, x, can be defined as x = [τ |ξ ≥ 0] with a distribution function given in where Φ and φ are the cumulative distribution function (cdf) and pdf of an n 2 dimensional Gaussian, respectively.A multivariate CSN can be parameterized as shown in the following equation: where Γ is the skewness, μ and ν are the location, and Σ and Δ are the scale parameters, respectively.It can be noted that setting Γ = 0 reduces the CSN distribution to a Gaussian distribution, and hence, CSN distribution generalizes the Gaussian distribution.

III. OPTIMAL DIMENSIONALITY REDUCTION FOR ONLINE IMPLEMENTATION
Similar to the Kalman filter, CSN parameters can be calculated recursively [9].Assume t ∼ N ( t ; 0, Σ ) and ψ t ∼ CSN(ψ t ; μ ψ , Σ ψ , Γ ψ , ν ψ , Δ ψ ).For the sake of simplicity in notation, assume that the statistical properties of and ψ do not vary over time.The prediction step for the corresponding CSN filter can be derived as shown in [9] x− where (•) − and (•) + represent a priori and a posteriori estimates, respectively.Furthermore, the update step can be derived as follows [9]: where K t is the Kalman gain, and Σ ,t and Σ ψ,t are the known state and measurement covariances, respectively.In addition, the expected mean of the state of interest can be calculated as [28] E . (8) Note that the dimension of the parameters that introduce skewness is increased by one at the end of each update step.This complicates the solution because one cannot derive (8) analytically when the dimension continuously increases (when q 2 > 1).In addition, this inevitable increase causes memory issues that make the skew filter impractical for online implementation.This study proposes a numerical optimization solution to the dimensionality increase problem by approximating the 2-D CSN distribution and CSN n 2 ,2 (x; μ, Σ, Γ, ν, Δ), at the end of each update step.Consider a 1-D CSN, CSN n 2 ,1 (x; Θ) where Θ = {θ 1 , θ 2 , θ 3 , θ 4 , θ 5 } represents its unknown parameters.The goal is to find the best parameters, , such that the difference between the two CSN distributions is minimized.Since n 2 is constant during optimization, it is ignored for the sake of simplicity in notation.A simple solution to this problem is to minimize the difference between Φ 1 (θ 3 (x − θ 1 ); θ 4 , θ 5 ) and Φ 2 (Γ(x − μ); ν, Δ) (the cdf components of the 1-D and 2-D CSNs) since the dimensionality increase is observed in the cdf.However, this solution may miss the useful information that φ(x; μ, Σ) contains.
This study presents an alternative and effective solution where Θ is calculated by considering the five parameters in the distributions.Different distance/divergence indicators such as the Kullback-Leibler (KL), Jensen-Shannon (JS) divergence, total variation distance (TVD), Hellinger distance, density norms, etc., can be used to approximate the high-dimensional CSN.However, these indicators do not consider the underlying space geometry and may yield nonsymmetric, discontinuous, or inaccurate results due to their properties.They can reduce the reliability of the filter or can make it diverge during the estimation.
Based on transport theory, the Wasserstein distance (WD) [29], [30] calculates the deviation between two distribution functions while addressing the abovementioned issues, and it can be derived as shown in where ζ is the order of the WD, F −1 1 (•) and F −1 2 (•) are the quantiles, and F 1 (•) and F 2 (•) represent the cdfs of CSN 1 (•) and CSN 2 (•), respectively, as shown in (10).For notational simplicity, the function parameters are omitted in (9) Note that CSN 1 and CSN 2 are assumed to be in the Wasserstein space W, which makes the WD more effective than the Euclidean distances like the norm functions.Although the WD can be used for any distribution function, the term "CSN" is and will be used for simplicity and demonstration purposes.By definition, the WD satisfies the following useful metric properties: By using ζ = 1 for simplicity and considering the equal integration step dz, the optimization function can be formed as shown in Note that the properties shown in ( 11)-( 13) maintain the convexity of this objective function.Moreover, the optimization Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Algorithm 1: Online State Estimation With Skew Dimensionality Reduction By Using The SQP Algorithm.
function and the properties would be valid in the presence of discrete random variables.Depending on the design criteria, constraints can be implemented on Θ, which will not be considered in this study.In addition, the proposed methodology considers each dz point in F 1 (z|Θ) and F 2 (z|•).However, the existing methodologies, like the one developed in [11], consider only three dz points (which are user-defined) in Φ q2 and do not contain the information captured in φ n2 (•)).Due to these differences, the existing methodologies may result in inaccurate results.
The nonlinear optimization problem, shown in (14), can be solved optimally by using numerous optimization techniques, including the sample or gradient-based methods.This study utilizes the sequential quadratic programming (SQP) approach due to its simplicity and generality.For example, in the presence of prior knowledge about Θ, constrained optimization can be applied to find Θ * by using SQP.We will also compare it with several optimization techniques, such as particle swarm (PS), conjugate gradient (CG), Nelder-Mead, Broyden-Fletcher-Goldfarb-Shanno (BFGS) Limited-memory Broyden-Fletcher-Goldfarb-Shanno with bound constraints (L-BFGS-B) Constrained optimization by linear approximation (COBYLA), and trust region algorithms, empirically.
The state estimation algorithm with skew dimensionality reduction is given in Algorithm 1.
Note that the SQP algorithm requires an initial set of parameters Θ init that can be tuned based on process knowledge or via a Monte Carlo simulator on a system model.After the initialization step, the proposed algorithm uses the previous parameter set Θ t−1 to calculate the current parameter set Θ t in order to reduce computational complexity.In the case of sample-based optimizers, Θ init is not required, and Θ t−1 does not need to be used in Step 6 of Algorithm 1; however, the sample-based optimizers may require additional exploration/exploitation hyperparameters.
Due to its computational efficiency, this simple yet effective algorithm can also be used in real-time controllers.In the following sections, an MPC (whose details are given in Algorithm 2) will be utilized in a pilot-scale experiment.Due to its simplicity and computational efficiency, the MPC optimizer will be SQP too.Note that the type of optimizers does not affect the fundamental concepts explained previously.Therefore, the optimizers can be substituted by other optimization algorithms depending on the accuracy, computational efficiency, etc., which may differ based on the process of interest.
Note that the optimizer in Algorithm 2 requires a system model, which is equivalent to the model used in Algorithm 1 and is not mentioned explicitly for the sake of simplicity.
Algorithm 3 represents a general data flow in RL agents and solution to reward (considered to be the state) filtering by using the proposed method.In this example, an RL agent interacts with a system in real time by observing a "situation," s t , taking an action u t , and receiving a skew reward y t .Note that the situation corresponds to the "state" in the RL literature, which is renamed for notational simplicity.Note that we will showcase a reward filtering example shortly, which is essentially equivalent to state filtering without loss of generality.Although the proposed method has been tested by using an A3C agent, it can be used to remove the skew noise independent of the type of agent.

IV. RESULTS AND DISCUSSION
The proposed method was tested on two pilot-scale experiments, which represent industrial chemical processes.Implementation details, the experimental setup, and the results obtained by using the proposed online skew filtering and control approaches will be presented in this section.
Note that the CSN noise in this study will be assumed to be zero-mean, as shown in (15).In the presence of biased noise, the filters may need to be improved by incorporating a bias correction mechanism (15) For comparison purposes, an online median and an average filter [31] will be used as shown in ( 16) and ( 17), respectively where MED is the median operator, N ≥ 3 is the window size, β is a fixed weight, and ŷ+ avg,0 = y 0 .As shown in ( 18), the meansquared error (mse) will be utilized to measure the estimation and control quality.
where Y t represents the predicted state ŷt for state estimation and setpoint x t,sp for control.This study aims to minimize the prediction and control errors, mse P and mse C , respectively.

A. Implementation Details
The estimator/controller computers have Intel Core i7-4790 K CPU (4 CPU cores) at 4.00 GHz, and 8 GB DDR4 RAM at 1600 MHz under the 64-b Windows 10 operating system.The filtering and control algorithms were designed in Python v3.7 by using the steps presented in Algorithms 1 and 2.An open-source library, SciPy v1.5.4,was used to design the optimization algorithms.During calculations, real-time communication between the computers (estimator/controller) and experimental setup was established using a socket connection to an Opto22 open platform communication server.Nonetheless, the generality of the proposed method does not depend on the type of connectivity.More information about the experimental setup has been given in [32].

B. Application 1: Online Reward Estimation for the RL Agent in a Primary Separation Vessel
The primary separation vessel (PSV) is a key component in the oil sands industry, where an interface between two liquids is the target variable to be detected and tracked accurately [32].The tank is schematically shown in Fig. 1.Dogru et al. [33] developed a robust interface tracking algorithm, where an RL agent receives a noisy nonlinear reward at each step.Since this reward is utilized for online learning of the agent, it is crucial to reduce the noise effectively in real time.
Without the loss of generality, consider the reward is the state of interest to be estimated (i.e., r t = x t ), which can be modeled   (14) by using Because the RL agent tries to maximize the reward, r t = x t ≤ 0 and y t ≤ 0 are often preferred to provide feasibility in control applications [32], [33].Due to this property, when the noisy reward magnitude is low, the resulting reward becomes a truncated variable [33].Inspired by the work in [9] and [14], we mimic an engineering problem where the sensory measurements are contaminated with skew noise.Since the CSN distribution can generalize the distributions ranging from truncated Gaussian to regular Gaussian, the synthetically added observation noise, ψ, was chosen to follow a CSN distribution.The proposed algorithm will be used to recover the hidden state (true reward) efficiently, as will be discussed in the following paragraphs.
As one of the key components of this study, different distance measures are compared under skew noise with (Γ = 10, Σ = 5), (Γ = 50, Σ = 4), and (Γ = 100, Σ = 20).For optimization, the SQP algorithm was used at this step.As shown in Table II, the Wasserstein-1 distance consistently yielded the lowest mse for these problems, whereas some of the other functions resulted in either higher mse or divergence.In addition, to this quantitative difference, note that WDs theoretical properties provide quantitative improvements as mentioned previously.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Then, different optimizers were compared in terms of their accuracy and computational speed while using WD as the objective function.As shown in Table III the SQP algorithm yielded the lowest mse and execution time compared to the other optimization algorithms due to its simplicity.On the other hand, the other optimizers either caused higher mse or diverged.Note that some of these optimization schemes were designed for constrained optimization, and thus, the results may differ when there are constraints on Θ.Nevertheless, all tested methods yielded a solution of less than a second for each execution, which showed that all of them could be used for online estimation for systems that have a sampling time of 5 s.In addition, the noise properties and skewness parameters could affect these results, which should be considered during the design phase.
After observing that the SQP method yields the lowest error and computational time, different filters were compared against the SQP-CSN filter under different noise conditions.As shown in Table IV, the median filter (16) has higher error values than the other filters because it starts estimation after N ≥ 3 steps.When the skewness increases, the Kalman filter performs slightly better than the average filter.Nevertheless, the median and average filters are not traditional state estimation methods and do not consider the covariance information.On the contrary, the skew filter yields the lowest error consistently because it is the most flexible method within the compared methods.
Fig. 2 summarizes the experimental state estimation study, where the median filter results in the largest mse, and the proposed method outperforms the other methods.Overall, the SQP algorithm combined with WD consistently yielded the best solution in terms of computational time and accuracy under different noise conditions for the real-time RL application.Therefore, this combination will be used in the MPC implementation in the following section.

C. Application 2: Online Level Control in the Hybrid Three-Tank System via an MPC
The hybrid tank system consists of three identical cylindrical tanks, a storage tank, two pumps, three pressure (level) sensors, and nine valves, as shown in Fig. 3.The goal of this system is to control the level of the liquid at the desired setpoints in Tanks 1 and 3.In this study, only the left tank will be used.Since the proposed filter and the MPC are multivariate, the method can be extended to multiple tanks trivially.
Four different skew noises were synthetically introduced to the system with varying skewness and scale parameters to mimic realistic filtering problems with asymmetric noise [9], [10], Γ ψ ∈ {1, 2} and Σ ψ ∈ {5, 10}.The MPC employed the predicted state values that resulted from the Kalman and the proposed skew filter.The results are numerically given in Table V.As the table shows, the proposed method outperforms the Kalman filter in all the tested scenarios in both mse P and mse C .This happens because the additional parameters provide flexibility to the filter while the optimizers find the best CSN parameters and control actions.Because the proposed method improves the prediction first, the control performance is also improved since the MPC utilizes these improved predictions.Since the optimization method is SQP, the process can be controlled online without any computational overheads.
The experimental results showed that the proposed dimensionality reduction method could yield accurate estimation when WD is combined with SQP while avoiding the dimension increase problem.Due to the simplicity of the utilized approaches, the proposed methodology provides practical solutions to realtime estimation and control problems.Both experimental studies showed that the proposed method outperforms existing filtering techniques, which highlights the effectiveness of the method.

V. RESEARCH PROSPECTS AND CHALLENGES
This study has focused on state estimation and control in linear time-invariant systems.However, industrial setups might include several aspects, such as nonlinearities, operational drifts, multimodality, significant time delays, and other practical issues.Although the proposed methodology can be used when the effects of these aspects are insignificant, more advanced methodologies may be needed when the severity of these aspects increases.To address these aspects, several modifications such as linearization, using higher-order models, and consideration of multiple modes can be investigated in future work.Another extension to this work can be incorporating smoothing to improve the estimation accuracy, as mentioned in [11] and [14].Other challenges to be addressed are strong non-Gaussianity and outliers, which might require more complex distributions.As a result, the following can be considered in future research: nonlinearity, non-Gaussianity, outliers, multimodality, operational drifts, and time delays.

VI. CONCLUSION
Online controllers are required to enhance process safety and provide optimal production.However, these controllers utilize noisy measurements that are traditionally assumed to be Gaussian.Various industrial applications may involve skewed noises that can be represented by CSN distributions, whose recursive estimations become challenging as dimensionality increases.Although several attempts have been made to address this issue, optimal online estimation remains an open challenge.Inspired by the optimal transport theory, this study developed an online dimensionality reduction technique while considering the entire empirical distribution.In addition, to proposing a novel methodology for online skew state estimation, this work provided theoretical and empirical insights about the proposed method.After comparing different objective functions, it is found that a combination of WD and SQP provides the best performance in both estimation and control.Finally, the proposed scheme was implemented in two experimental setups for realtime RL and MPC applications.Overall, the proposed method can be beneficial for reducing the dimensionality of multivariate CSN distribution for online applications.Furthermore, it can be utilized in multivariate estimation and advanced control.

Algorithm 2 :Algorithm 3 :
Online MPC With Skew Dimensionality Reduction By Using The SQP Algorithm.Online RL Reward Filtering With Skew Dimensionality Reduction By Using The SQP Algorithm.

Fig. 1 .
Fig. 1.Experimental, pilot-scale PSV setup.The black liquid (a water mixture) represents the middlings, and the white liquid (oil) mimics the froth layer in the industrial setup.

Fig. 2 .
Fig. 2. Comparison of different (a) optimizers (b) filters for online state estimation under skew noise with (Γ = 10, Σ = 5).(a) Note that all the algorithms except the SQP algorithm diverged during estimation.The SQP algorithm yielded the best performance.(b) Median filter showed the worst performance, followed by the Kalman filter.Note that the proposed method significantly outperformed the other methods in terms of estimation accuracy.

Fig. 3 .
Fig. 3. Piping and instrumentation diagram of the hybrid three-tank system.In this study, only the left tank and the corresponding instruments (left pump, V1, V3, and V5, LT 1 ) are used for simplicity.
and y t ∈ R p are the state, action, and measurement, respectively.t,t , and ψ t , respectively, represent the discrete time step, state, and measurement noises.A t , B t , C t , D t , E t , and F t adjust the contribution of each element to the state/measurement and can be determined by using a proper system identification method.In practice, the true value of the state of interest x t may be unknown.Based on Bayes' theorem, Kalman filtering (KF) can be used to calculate the estimated state, xt , optimally in the presence of a Gaussian state and measurement noise.The filter, at each time step, estimates a mean and covariance for the state of interest given the noisy measurement.Note that if C t = 1 and D t = 0, then estimating ŷt is equivalent to estimating xt .The state space model can be simplified by assuming D t = 0 and

TABLE II COMPARISON
OF DIFFERENT OBJECTIVE FUNCTIONS TO MINIMIZE THE DIFFERENCE BETWEEN CSN 1 AND CSN 2 , I.E., TO BE USED AS d IN

TABLE IV COMPARISON
OF THE MSE ∓σ VALUES (IN PX.) OF THE PROPOSED CSN FILTER AND THE EXISTING FILTERS UNDER DIFFERENT NOISES OBTAINED FROM 11 RANDOM SEEDS

TABLE V PREDICTION
AND CONTROL ERRORS (IN CM) BY USING THE KALMAN FILTER AND THE PROPOSED FILTER