PROGNOSTIC AND HEALTH MANAGEMENT IN OCEAN ENERGY SYSTEM: A SELF-HEALING FRAMEWORK BASED ON REINFORCEMENT LEARNING

In this paper, for minimizing the cost from the ocean generator power production by optimizing the operation and maintenance (O&M) policy over an infinite time horizon, while considering the uncertainty of the renewable sources and components failure behaviors, we develop a self-healing framework for ocean energy systems. It consists of three major modules: data manipulation, health assessment, and decision-making. Specifically, a graph-theoretic approach is first proposed for ocean generator health monitoring utilizing multivariate time-series data, then, reinforcement learning (RL) based technique exploits the health states of the system that provides decision support for optimal O&M management.


INTRODUCTION
Research efforts have been focused on harvesting electricity from renewable ocean energy in a commercially and technologically acceptable manner [1]. Since the harsh and remote working environment, one of the major issues in cost-effectively integrating renewable ocean energy into power grids is the prognostic and health management (PHM) of multiple offshore/inshore devices, which drives the need for facilitating Systems-Level Thinking in PHM system [2]. This requires developing reliable self-prognostic and self-decision-making techniques which could account for both the complexity of the asset and the uncertainties on its operational conditions, failure modes, degradation behaviors, external environment, etc.
In this paper, for minimizing the cost from the ocean generator power production by optimizing the operation and maintenance (O&M) policy over an infinite time horizon, while considering the uncertainty of the renewable sources and components failure behaviors, we develop a self-healing framework for ocean energy systems, shown in Figure 1. It consists of three major modules: data manipulation, health assessment and decision-making. Specifically, a graph theoretic approach is first proposed for ocean generator health monitoring utilizing multivariate time-series data, then, reinforcement learning (RL) based technique exploits the health states of system that provides decision support for optimal O&M management.

SELF-HEALING PHM SYSTEM
A self-healing PHM system automatically integrates the results from the well-designed sensor net all the way through to the decision-making module that provides support for optimal use of O&M resources. The core of this strategy is based on: 1) accurately forecasting the onset of imminent health conditions or failures of critical components and 2) efficiently spotting the root cause of failures once effects have been detected. From this perspective, if health conditions/failures predictions can be made, the allocation of preventive or corrective actions can be scheduled in an optimal fashion.

DATA MANIPULATION & HEALTH ASSESSMENT
The aim of data manipulation is to represent a multivariate time series system measurement as a lower-dimensional weighted and undirected network graph that contain sufficient degradation/failure signatures in order to increase the efficiency and reliability of health assessment. This approach involves the following steps: corresponding to known status , = 1, ⋯ , with each window a × matrix. 2. Transform the signal into a weighted and undirected network graph ( ) ≡ ( , ℰ, ). The nodes are the rows and columns of the symmetric similarity matrix where the pairwise is computed by Mahala Nobis kernel Ω for each window: = ( , )∀ , ∈ (1,2, ⋯ ) and the correlation between each pair of nodes is indexed by edges, i.e., connection status ℰ and weights .
Step 1.Data Segmentation Step 2: Transform Data into Graph Step 3: Topological Feature Extraction Step 4: Identify the Health States from a Dictionary of Signatures 3. Extract the spectral graph Laplacian matrix ℒ ( * , ) from once it transformed into a graph G( ). The transformation from signal corresponds to status to the spectral graph is: G( ) = [ℒ 1 ⋯ ℒ ℎ ] which employed to capture the inherent dynamics of the signal. 4. Select an orthogonal subset of the graph Laplacian Eigenvectors as a basis set corresponding to health state . Each is decomposed by taking an inner product akin to a Fourier transform into a set of coefficients . Repeat this procedure for all status , = 1,2, ⋯ , , a dictionary of can be formed as: 5. Given an unknown signal segment × , obtain the candidate set by an inner product , that is Ĉ = [̂1 ⋯̂]. Then compare each ̂ with associated coefficients (having the same label ) in the dictionary ℂ. The label assigned to y is the one with the minimum squared errors e, i.e. = argmin .

REINFORCEMENT DECISION-MAKING
Developing a reinforcement learning based decision-making module requires defining the environment and its stochastic behavior, the actions that the agent can take in every state of the environment and their corresponding effects and reward generated [3]. Environment state: Consider a system consists of elements = {1, … , }, physically or functionally interconnected. The degrading elements ∈ ⊆ are affected by independent degradation mechanisms, obeying a Markov process that models the stochastic transitions from current state ( ) to the next state ( + 1) , where ( ) ∈ {1, ⋯ , n} , ∀ , ∈ , = 1, ⋯ , . These degradation states are estimated by the health assessment modules. At each time t, the system state vector reads as = [s 1 ( ), s 2 ( ), ⋯ , s ( )]. Assume that the stochastic behavior of the environment is completely defined by transition probability matrices of each element = 1, ⋯ , | | and to each action a ∈ , that is, where , represents the probability ( | , ) of having a transition of element d from state to state j, conditional to the action a. Actions: Action can be performed on the system element g ∈ G ⊆ C. The action vectors a at time t is = [ 1 ( ), ⋯ , ( )] ∈ . The action set includes both operational actions (OM), preventive maintenance (PM) and corrective maintenance (CM) actions. CM is to fix an out-of-service faulty condition to an in-service healthy condition and PM is to improve the condition of an in-service but degraded element. Additional constraints can be defined, considering that some actions are disallow in particular states, e.g., CM is the only allowed for failed elements. Both PM and CM are assumed to restore the healthy state for each degraded element ( Figure 3). Reinforcement learning: The goal of the agent for strategy optimization is to obtain the optimal action-value function, which is the maximum sum of rewards discounted by at each time step t, achievable by a behavior policy = ( | ), after making an observation s and taking an action a: * ( , ) = [ + +1 + 2 +2 + ⋯ | = , = , ] and the algorithm for training Deep Q Network could be referred to [3] and shown in Algorithm 1.

CASE STUDY
The proposed self-healing framework is applied to a scaled-down ocean power system ( Figure  4). The system consists of 2 controllable generators, 1 energy source providing electricity, 1 connected load depending on random conditions and 4 transmission cable. The generators, cable 4 and 5, are under degradation and equipped with PHM capabilities to inform the decision-maker on their states. We consider 4 degradation states for generators, =1,2 = {1, 2, 3, 4} , shown in Figure 3. For the load/energy resource, we consider 3 states of rising power demand/production. For cables, 3 degrading states are defined. We assume that both generators have identical transition probability matrices (similar to [4]), and the cables degradation are described by the same Markov process. Hence, = [ 1 , 2 , 3 , 4 , 5 , 6 ] and the state space is made up of 1296 points. We defined 5 actions (3 OM, 1PM, 1CM) that can be applied to generators while keeping the system's structural and functional integrity. The action vector reads = [ 1 , 2 ] 1,2 = {1, … ,5} . This gives rises to 32400 state-action pairs. Each action has a specific transition probability matrix, describing the generator degradation conditioned by its operative state or maintenance action. The case-specific reward is made up of 3 contributions: the cost of demanded power from ocean generators, the cost of producing electricity by ocean generators and the cost of the performed actions, and is formulated as: where is the unit price of ocean generator produced power, is the power produced by ocean generator with unit cost , and − is the cost of actions.  --------------------------------------------------------------------------------

RESULTS
The ocean generator's condition identification results with high accuracy (Table 1) verified that the proposed graph theoretic method is a reliable health assessment method. The RL results are summarized in Figure 5, by visualizing the distribution of * ( , ) over the states for all the combination of action = [ 1 , 2 ]. According to the empirical CDF, we can identify three clusters: the states set (1 curve) for which both generators are under CM, the states set (8 curves) for which only one of the generators is under CM and the states set (16 curves) includes only PM and operational actions. CM is a costly action and leads to negative expectation of reward, whereas PM and operational action leads to higher positive expectation of reward.

CONCLUSIONS
The framework is experimented on a scaleddown ocean generator powering a load case, showing that the proposed method can effectively identify the system's operational condition and produce efficient solutions to O&M management.