Bilateral Sensitivity Analysis for Understandable Neural Networks and its application to Reservoir Engineering

—In this paper, a model-independent sensitivity analysis for (deep) neural network, Bilateral Sensitivity Analysis (BiSA) , is proposed to measure the relationship between neurons and layers. Both the BiSA between pair of layers and the BiSA between any pair neurons in different layers are deﬁned for (deep) neural networks. This sensitivity can measure the inﬂuence or contribution from any layer to another layer behind this layer in the (deep) neural networks. It provides a helpful tool to interpret the learned model. The BiSA can also measure the inﬂuence or contribution from any neuron to another neuron in a subsequent layer and is critical to analyze the relationship between neurons in different layers. Then the BiSA from any input to any output of a network is easily deﬁned to assess the connections between the inputs and outputs. The proposed BiSA of (deep) neural networks is then applied to characterize the well connectivity in reservoir engineering. Given a network trained by Water Injection Rates (WIRs) and Liquid Production Rates (LPRs) data, the well connectivity can be efﬁciently described through BiSA . The empirical results verify the effectiveness of the proposed method. The comparisons with the exiting methods demonstrate the robustness and the superior performance of the proposed method.

I. INTRODUCTION (Deep) neural networks have shown powerful capabilities and effectiveness in realizing intelligent systems [1]- [3]. Despite this, the "black-box" nature still limits its applications and becomes the main reason of the refusal by many application areas [4]. In reservoir engineering, for instance, the researchers question the reliability of neural networks in real oil field handling, which has a direct effect on the cost of production and productivity. Researchers need machine learning models that they can understand and interpret.
Besides, they are eager for meaningful interactions with the models. Recently, several attempts have been made to develop understandable machine learning models that can explain the insights of the models to improve model transparency and obtain thorough insights into what is learned by the models [5]. This issue has been addressed by various means such as extracting symbolic rules [6], [7], visualization [8], and quantitative analysis [19]. However, explaining neural networks, especially deep neural networks, is a hard issue, even simple properties of the models are difficult to describe using strictly theoretical inference. Hence, more efforts are needed to clarify the internal mechanism of the mighty (deep) neural networks.
Sensitivity analysis (SA) is a process that determines the impact of an input variable on the output of a model [9], [10]. The most widely adopted definition of SA is the one introduced by Saltelli et al. [11] as follows: the study of how uncertainty in the output of a model can be apportioned to different sources of uncertainty in the model input. Sensitivity in a learning system refers to the change in the output of the network corresponding to a small change in the parameter(s) under study. Sensitivity analysis, as a fundamental part of complex system analysis, has been proven to be an efficient tool to interpret the awareness of neural networks [12]- [14]. It opens the "black-box" by investigating the first-hand impacts of the parameter(s) on the outputs for classification or regression tasks, and helps to understand the learned relationship between the input parameters and the output variables. SA has been used to analyze the adaptive linear element (Adaline) network as early as 1962 [15]. After that, a series of works have been developed to discover the characteristics of sensitivity for various neural networks, such as Adaline-based networks [16], multi-layer perceptron (MLP) neural networks [17], [18], and deep neural networks [19]. In [20], the sensitivity analysis of ANNs is described as a partial derivative of the output to the input of the network. Some other literatures focus on the variance of the output error under certain assumptions [17], [18]. Most works have implemented sensitivity analysis either on input or weight perturbations for a single neuron taken from an MLP network. In [17], sensitivity analysis is performed on the output layer's error, sequentially computing the sensitivity neuron by neuron from the first to the last layer.
The neural network sensitivity analysis methods are developed in two major streams: partial derivative SA (PD-SA) and stochastic SA (ST-SA). The PD-SA method [20]- [26] primarily quantifies the significance of the input parameters to the model output based on the differentiation of the input variables from the output variables. A standard approach in sensitivity analysis is to compute the partial derivative of outputs with respect to its input vector via the chain rule of the derivatives, computing one layer at a time (Hashem [20] and Fu and Chen [21]). On the other hand, the PD-SA computes the sensitivity of the MLP neural network output according to input perturbation which approaches zero [23], [27]. It requires an assumption that the input perturbations are random variables with the maximum absolute value less than or equal to a calculated Q-value. This is not applicable to common models owing to the difficulty of computing the Qvalue automatically. In [23], a PD-SA is applied to quantify the relevance of input and hidden neurons of feedforward neural networks. A variance-based pruning heuristic is proposed to determine which neurons to remove. The partial derivative sensitivity analysis computes the changes in the network output with respect to perturbations of the parameters.
The PD-SA is often done based on the approximation of the changes in the outputs with respect to the input or weight perturbations. For the m-th input of the j-th sample with perturbations, the Taylor expansion of the output changes is as follows [15]: where X m = (0, · · · , x m , · · · , 0), x m >0 is a very small fixed number for each feature. Then the PD-SA is defined as the first order derivative,f w X (j) , based on the assumption that the part 1 2f w X (j) ( x m ) 2 + · · · is very small. PD-SA is sometimes represented asf w X (j) if f w X (j) ≈ 0. In [10], the PD-SA for radial basis function neural network (RBFNN) with regard to the m-th input of the j-th sample is defined as where X (j) represents the j-th sample, x m is the m-th input, w i is the weight connecting the i-th hidden node, U i and v i are the center and width of the i-th basis function, respectively. This kind of method has two main weaknesses. First, it cannot deal with networks with non-differentiable activation functions, and second, it does not consider the magnitude of the parameters. The stochastic sensitivity analysis (ST-SA) uses the magnitudes of the output perturbations between the original samples and perturbed samples, in a statistical sense. Because of its high computational efficiency, and as long as the function expression between the output variable and the input parameter is given, the sensitivity of the output variable in different input parameters can be effectively analyzed. So it is widely used in sensitivity analysis. The ST-SA for neural network is defined by the magnitudes of the change in the output with respect to weight or input perturbations. Compared to PD-SA, where the rate of output change is used to approximate the sensitivity corresponding to parameter perturbations, the ST-SA possesses several advantages. First, the ST-SA does not evaluate each training sample one by one and constrains less on the used perturbations. It measures the differences between the original outputs and the perturbed outputs based on the analytical formulas for the neural network. Besides, ST-SA is more interpretable, and it is more meaningful for feature or instance selection. Finally, ST-SA is a straightforward manner reflects the generalization error because of the usage of magnitudes of output error produced by the parameter perturbations, which is a direct reflection of the generalization. Hence, the sensitivity analysis for neural networks in this paper is defined in a stochastic manner.
In this paper, we first propose a Bilateral Sensitivity Analysis (BiSA) scheme for the (deep) neural networks, then apply it to an important issue of characterizing the well connectivity in reservoir engineering.
The main contributions of this paper are: (i) The model-independent BiSA between any pair of layers in a neural network is defined; (ii) The model-independent BiSA between any pair of neurons where the members of the pair are from different layers in a neural network is defined; (iii) The BiSA from an input to the output of a neural network is given; (iv) The proposed BiSA of neural networks is applied to characterize the well connectivity in reservoir engineering.
The remainder of this paper is organized as follows. Some related works on stochastic sensitivity analysis and inferring well connectivity are investigated in Section II. In Section III, the Bilateral Sensitivity is proposed for both between a pair of laye rs and between as pair of neurons from different layers. The reservoir connectivity is characterized using the Bilateral Sensitivity in Section IV. Some useful conclusions of this work are given in Section V.

II. RELATED WORKS
In this section, first, a brief survey of stochastic sensitivity of neural networks is investigated to make clear the diverse definitions. Then some basic knowledge of well connectivity in reservoir engineering is given to make readers ready for a specific application of the proposed BiSA method.

A. The stochastic sensitivity analysis of neural networks
To the best of our knowledge, the sentivity of neural networks was first discussed by Hoff in [28] to analyze the output changes of Adalines generated by weight perturbations and later it was extended in [15], [16], [30]. Stevenson et al. [16] studied the sensitivity of multi-layered networks (Madaline) to weight errors, input errors, and the combination of input and weight errors. For instance, the probability of decision error was expressed in term of the ratio for weight error. Similarly the probability of a decision error was approximated by a simple expression involving the weight and input perturbation ratios due to the combined effect of weight and input perturbations.
In [16], [18] and [29], the statistical sensitivity to weight perturbations W is defined as where y is the output changes due to weight perturbations, σ is the standard deviation of each component of W , and var[·] is the variance of [·]. Piché [30] introduced the direct statistical analysis to determine the sensitivity of a neural network to perturbations of weights. It provided understandable consequence of neurons by the variance of the output error for Madalines according to weight perturbations. This is a fundamental work for ST-SA, where the output error of the n-th Adaline node in layer l caused by the input and weight perturbations is defined as where X l is the perturbation on the input to the l-th layer, X l , and W p l is the perturbations to the weight W l associated with the n-th Adaline node in layer l. Then the sensitivity of the l-th layer output to perturbations in the weights is represented by the noise-to-signal ratio (NSR), which is defined as the ratio of the variance of the l-th layer's output error to the variance of the output of the l-th layer: In [31], Alippi et al. introduced a sensitivity to errors in neural networks using probability as follows: where p δy l |δX l is the conditional probability that a relative output error δy l is generated whenever the input relative error is δX l . One important definition of stochastic sensitivity is proposed by Sobol [36] considering the variance with respect to the parameters and the total variance of the output. Sobol defined the global sensitivity indices as as follows: where V (f I ) is the variance corresponding to the indices I, V (f ) is the sum of all the first-order and higher-order terms added up to the total unconditional variance. Saltelli et al. [11], [32], [33] improved the estimation procedure for computing variances based on the Fourier amplitude sensitivity test (FAST), where the sensitivity is computed through where E(Y |X h ) represents the expectation of Y conditional on X = x h , Var X h denotes , Y is the output, and x h is the h-th input factor. Fock [34] did further improvements on the extended FAST method. Furthermore, a global sensitivity analysis based on the ANOVA decomposition was introduced in [35] as an alternative way of measuring the sensitivity indices of Sobol decomposition [36] for neural network classifiers. However, two conditions, limited variable range and squareintegrable classification function, are required to be imposed on the classification function.
Stevenson et al. [16] used the probability of decision error to describe sensitivity but the magnitude of the error was not considered. This was enhanced by Cheng and Yeung [37] in neocognitron model but still could be directly used in MLP. With antisymmetric squashing activation function, Yeung et al. [38] extended Piche's method [30] and removed the independently identically distributed (i.i.d) restriction on the input and output errors. In [17], sensitivity is defined as the expectation of the output errors with respect to the input and weight perturbations in a continuous interval. Both the sensitivity for a single neuron and of the entire multi-layer perceptron network are discussed. However, the computation is highly complex, especially for the tasks with high dimensional data. Shi et al. [39] generalized this method to radial basis function (RBF) networks, later Yeung et al. [40] introduced a localized generalization error model (L-GEM) within a so-called Q-neighborhood of the training samples. However, this method does not calculate the sensitivity for individual instances. In [41] another Q-neighborhood based sensitivity was introduced to assess sensitivity for individual instances of the imbalanced classification problem. Recently, a stochastic sensitivity [42] based on L-GEM was introduced to provide a straightforward measure on an MLP's output fluctuations.
Ng et al. [43] utilized a localized generalization error model to create a novel hybrid filter-wrapper-type feature selection methodology. This solution gives the possibility of removing a large percentage of features causing a statistically insignificant loss in the accuracy. The work presented in [44] defined the stochastic sensitivity as the expected value of square of the changes in the classifier output with respect to feature perturbations. This, in turn, allows the authors to crate a radial basis function neural network that can be trained by minimizing the defined sensitivity.
The expectation based methods are also combined with the variance to achieve a better performance. For the jth instance X (j) with the perturbation on the m-th feature, X (j) m ∈ R, the sensitivity of this feature is given based on the difference of the network output where y (j) is the output changes for the j-th sample, W is the weight perturbation on weight W . For instance, the ST-SA for RBFNN in [15] is defined as the combination of expectation and variance: where y is the vector of output changes for all samples, c is a positive constant, E( y) is the expectation and V ar( y) is the variance of the output changes.
Recently, Karmakar et al. [45] exploited the statistical sensitivity in [18] as a penalty in the objective function of an MLP neural network to enable the network to say "Don't know". Xiang et al. [46] introduced the maximum sensitivity by a bounded disturbance on the nominal input to measure the maximum deviation of outputs. Li et al. [47] studied the deviation of functions represented by DNN from their typical mean field solutions by the large deviation theory and path integral analysis, where the commonly used weight sparsification and binarization in model simplification were investigated under parameter perturbations. In [48], a provable Sensitivity-informed Provable Pruning (SiPPing) method of neural networks was suggested based on a ST-SA of measuring the importance of each weight for one layer. The sensitivity of a weight was defined as the proportion of the signal delivered by this weight to the whole signal of that layer. Motivated by information geometry, Shu et al. [49] introduced a stochastic sensitivity analysis for DNN classifiers using a perturbation manifold. To interpret and to enhance the adversarial robustness of DNNs, a sensitivity measure of a neuron was defined by measuring the intensity of variation of neuron's behavior against benign and adversarial examples [50]. Given an example x j , (j = 1, 2, · · · , J) and its corresponding adversarial example x j , the sensitivity for the m-th neuron in the lth layer was computed as where F m l (x j ) and F m l (x j ) are the corresponding outputs of neuron F m l . These sensitivity methods provide a deeper understanding of the (deep) neural networks. Nevertheless, it is necessary to study the sensitivity analysis from the perspective of a general view and provide the sensitivity analysis for both the bilateral neurons and layers. Hence, this paper focuses on contributing a model-independent bilateral sensitivity to evaluate the contribution from one neuron to another neuron or from one layer to another layer. It is easily extended to the cases from one neuron to one layer or from one layer to a neuron. This is essential to deeply understand and interpret the mechanism of the (deep) neural network.

B. Reservoir connectivity
In this paper, we will test the proposed bilateral sensitivity on the task of inferring well connectivity in reservoir engineering, which is a hard and classic inverse-problem in engineering. Well connectivity remains a vital research topic in the petroleum industry, where waterflood is commonly used as the secondary recovery technique to generate man-made oildisplacing energy. Well connectivity reveals the connectivity between the injectors and producers to guide optimizing operations and maximize oil production. Usually, it is identified by the recorded production and injection data. However, the issue is very tough owing to its non-stationary and non-linear nature [51]. Fortunately, abundance of production data such as injector and producer flow rates can be easily obtained even for marginal oil fields. These production data contain the information of well connectivity. Various methods using production data to infer connectivity have been developed. For instance, the injection and liquid production rates are used in a linear multivariate regression (MLR) model by employing capacitance model to characterize the well connectivity in [52], [53]. Wang et al. [54] introduced a method with signal processing techniques to represent connectivity. Recent works try to deal with this problem using machine learning techniques [55], [56], but the potential power of machine learning has not yet been exploited. The lack of interpretability of neural networks also limited the interest of reservoir engineer's to explore further the use of NNs on this task.
The sensitivity analysis has also been applied to the topic of reservoir connectivity inference. In [57], Albertoni et al. employed sensitivity analysis to multivariate linear regression (MLR) to infer the interwell connectivity. Demiryurek et al. [58] simulated the sensitivity analysis of neural network to analyze the connectivity between injector and producer in fields and similar approach was employed in [56]. However, both works use the identical SA method defined in [18], which is a partial derivative sensitivity analysis. The proposed BiSA in this paper is based on a stochastic manner, which is more suitable for well connectivity because the observed production data are not continuous basically.

NEURAL NETWORKS
In this section, we define the Bilateral Sensitivity Analysis for (deep) MLP neural networks. In fact, it is modelindependent and can be regarded as a measure for other kinds of models too. Assume that an MLP has (L + 1) layers, as shown in Fig. 1, and the l-th (0 ≤ l ≤ L) layer possesses n l neurons. Especially, n 0 represents the number of features in the input layer when l = 0 and n L indicates the number of outputs in the output layer when l = L. For the i-th (1 ≤ i ≤ n l ) neuron in layer l, X l = (x l 1 , x l 2 , · · · , x l n l−1 ) T is the input vector, W l i = (w i1 , w i2 , · · · , w in l−1 ) T is the corresponding connection weights, θ l = (θ l 1 , θ l 2 , · · · , θ l n l ) T is the bias vector of the l-th layer, then the output of the l-th layer is Here, we define the operator φ as Then we have The input to the l-th layer is exactly the output of the (l−1)-th layer, thus for an MLP, the mapping from its beginning input to its final output can be represented as where Assume that the red node in the lm-th (0 ≤ lm < L) layer is the i-th neuron in this layer, and the yellow node in the ln-th (0 < ln ≤ L, lm < ln) layer is the j-th neuron in this layer. The Bilateral Sensitivity of the two neurons is investigated in Section III.

A. Definition of Bilateral Sensitivity (BiS)
For the input vector of l m (0 ≤ l m < L) layer, X lm ∈ [0, 1] n lm −1 , it is the output of the (l m − 1) layer, thus then the output of the l m layer is Similarly, for the input vector of l n -th (0 < l n ≤ L, l m < l n ) layer, X ln ∈ [0, 1] n ln −1 and we have then the output of the l n layer is Based on these, the Bilateral Sensitivity of the l m -th layer to the l n -th layer can be defined using the perturbations on the inputs of the l m -th layer as follows.

Definition 1. Given the perturbations on the input vector of
, then the Bilateral Sensitivity of the l m -th layer to the l n -th layer (l m < l n ) is defined as This sensitivity can measure the influence or contribution from any layer to another later layer in a neural network including deep neural networks. It provides a helpful tool to interpret the the learned deep model. Here, the Bilateral Sensitivity BiS (l m , l n ) describes the relation between the l mth layer and the l n -th layer. The j-th element in the output of the l n -th layer is then the Bilateral Sensitivity of the i-th neuron in the l m -th layer to the j-th neuron in the l n -th layer can be given through perturbing the i-th neuron in the l m -th layer as follows.
Definition 2. Given the perturbation on the i-th input of the , the Bilateral Sensitivity of the i-th neuron in the l m -th layer to the j-th neuron in the l n -th layer is defined as This sensitivity can measure the influence or contribution from any neuron to any other neuron in any higher layer. This is critical to analyze the relationship from one neuron to another. The perturbation on the i-th neuron of the l m -th layer, X lm i , affects the whole subsequent network including the j-th neuron in the l n layer, as shown in Fig. 1. The vector W ln j = W ln (j, 1), W ln (j, 2), · · · , W ln (j, n ln−1 ) is the jth row vector of weight matrix W ln , i.e. the weight vector connecting all the neurons in the (l n -1)-th layer to the j-th neuron in the l n -th layer, as shown by the orange lines in Fig.  1 for example.
Subsequently, the relation between the i-th input and the j-th output for an MLP with L layers, namely the Bilateral Sensitivity of the i-th input in the 0-th layer to the j-th output in the L-th layer, is given by the following definition.
The proposed Bilateral Sensitivities are calculated after the network being trained and are independent of the optimization method used to train the network. Hence, the sensitivity indices are model-independent. They are calculated easily and efficiently using the optimized weights and biases. The calculation is only a chain of l (0 < l < L) operations consisting of multiplication, addition (if bias is used), and activation function. In this paper, the perturbations are generated by Gaussian noise. After training the network, the final weights capture the patterns existing in the training data. In this case, it is easy to compute the Bilateral Sensitivity for each pair, which reflects the connectivity between the corresponding injectionproduction wells through perturbing the water injection rates.

A. Inferring well connectivity
Here we consider an MLP with single hidden layer, as shown in Fig. 2, where no bias is used to simplify the discussion. The input vector, I = (I 1 , I 2 , · · · , I M ) T , consists of the injection rates from M water injection wells. The output vector, P = (P 1 , P 2 , · · · , P N ) T , involves the liquid production rates from N production wells. We train the network with the reservoir history data, i.e., the water injection rates (WIRs) for the water injectors as inputs and the liquid production rates (LPRs) for the producers as outputs. For a highly connected injector-producer pair, I m (1 ≤ m ≤ M ) and P n (1 ≤ n ≤ N ) for instance, LP R n (the liquid production rate of P n ) is expected to vary in accordance with the alteration of W IR m (the water injection rate of I m ). Nevertheless, the consistency tends to hardly occur for a poorly connected pair. Perturbation creation. After training, with the learned knowledge from the training history data, the relationship between input variables and target properties can be analyzed by the Bilateral Sensitivity Analysis. For the neural network with only one hidden layer in Fig. 2, the connectivity between the m-th injector and the n-th producer is defined by the Bilateral Sensitivity from the m-th input to the n-th output: (24) where I = (0, · · · , 0, I m , 0, · · · , 0) M ×J , the perturbation I m = ( I m1 , I m2 , · · · , I mj , · · · , I mJ ) 1×J . Suppose, I m = (i m1 , i m2 , · · · , i mj , · · · , i mJ ) is the m-th input vector for all training samples, std(I m ) is its standard deviation, then the j-th element in I m is created by where γ ∈ [0, 1] represents a percentage to control the difference between standard deviations of the generated perturbation and the original input data. In this paper, we let γ = 10% and 20% respectively to generate noises. We adopt such a strategy because if the variance of an input is very high, then it should be tolerant to larger level of noise perturbation. Data source. Two synthetic reservoir scenarios in petroleum engineering are used throughout this paper. One is from [55], the other is from a simulation of a complex reservoir scenario. Although both cases have relatively simple permeability, they are typical and universal examples of the real cases in practical applications. The understanding of these characteristics are quite crucial to the design of the water flooding scheme. These scenarios have been built and run by a commercial reservoir simulator, Eclipse 2011. These kinds of synthetic models have been proven to be useful in a number of works [55] and are widely used in reservoir engineering. The first case is defined as a Streak Case, which represents reservoirs with high-permeability streaks. The second one is defined as Braided River, which represents complex reservoirs which contain several high permeability channels. The proposed methods, INNGLP and CINNGLP, are tested on both cases with the network architecture in Fig. 2. We take the water injection rates of each injection as the network input and liquid production rates of each producer as the network output. All the reported results are the average ones obtained by 10 repeats of the experiments.
Global normalization. Normalization plays an important role in the pre-processing of data for a given task. Minmax and z-score normalization are some of the popular techniques used for relevance score normalization. For a good normalization scheme, the estimates of the location and scale parameters of the matching score distribution must be robust and efficient. Robustness refers to insensitivity to the presence of outliers. Efficiency refers to the proximity of the obtained estimate to the optimal estimate when the distribution of the data is known [59]. Here we use z-score normalization on the input data of the neural network, namely the water injection rates, while Min-Max normalization on the output data. Z-score normalization is a strategy of normalizing data that can avoid the outlier issue [60]. The z-score normalization is defined as below: where µ is the mean value of the original data vector X and σ is the standard deviation of the vector. For a variable, if all the values are equal (i.e., variance is equal to zero) then the normaized value is set to zero. Note that a global z-score, instead of column (or feature-wise) z-score, is implemented here to maintain the difference of the inputs (water injection rates) at different sources. However, z-score normalization is not suitable for the output data due to use of the tansig activation function at the output layer of the neural network, where the network output is limited from -1 to 1. Hence a global Min-Max normalization is used for the output data, i.e., the liquid production rates.

B. Case Study 1: Streak Case
The Streak Case is a public synthetic field case and its detailed description is available in [55], [61]. As shown in Fig. 3, it is a square reservoir made up of 5 vertical injectors and 4 vertical producers, represented by I1, I2, I3, I4, I5 and P1, P2, P3, P4, respectively. The permeability of the two high-permeability streaks I1-P1 (the streak between I1 and P1) and I3-P4 (the streak between I3 and P4) are set to be 1000 md and 500 md, respectively. The permeability for the rest of the reservoir is 5 md. Note that the permeability of the underground is not known in real applications. Both rock and water compressibilities are 1 * 10 −6 psi −1 . The oil compressibility is 5 * 10 −6 psi −1 . The porosity is 0.18 and the total mobility is 0.45 independent of saturation.
After being trained by the water injection rates and liquid production rates, the neural network has learned the knowledge from the input and output patterns. This means a nonlinear mapping from the inputs to the outputs, which reflects the knowledge of the reservoir, is established based on the learned weights. In this case, we have used the multiplicative Gaussian noise to perturb the inputs and obtained the Bilateral Sensitivity between each injector and producer pair. Fig. 4 shows the results of the proposed BiSA method with addictive 10% Gaussian noise (with σ = standard deviation of input data * 0.1) perturbation on the water injection rates of the Streak Case scenario, where the darker the color the stronger the connectivity. This heat-map clearly demonstrates the two streaks of high connectivity in Streak Case, i.e. I1-P1 and I3-P4, and the quite low connectivities for the other pairs. To verify the robustness of the proposed method, other noise levels are also tested in our experiments. Fig. 5 shows the results of the proposed BiSA method with multiplicative 20% Gaussian noise perturbation on the water injection rates of the Streak Case scenario. With this stronger perturbations, the heat-map more clearly reflects the two streaks of high connectivity, I1-P1 and I3-P4, and the low connectivities for the other pairs. This demonstrates the robustness of using the multiplicative Gaussian noise as the perturbation.
To explain the mechanism explicitly, we analyze it from the training data and the perturbations. Fig. 6 shows curves of the original I1 (the water injection rates for injection 1), 10% Gaussion noised I1, and the 20% Gaussion noised I1. With these perturbations, we turn to the corresponding outputs of the learned model. Fig. 7 shows the curves of the original P1 (the liquid production rates for producer 1), the output for producer 1 according to 10% Gaussion noised I1, and the output for producer 1 with regard to the 20% Gaussion noised I1, respectively. Obvious changes can be observed for the values of this output with both 10% and 20% Gaussion noised I1. This is because the network learned the knowledge of the   The results for the BiSA method with 20 percent Gaussian multiplicative noise perturbation on the Streak Case scenario. strongly connected streak between I1 and P1, which means the I1 affects largely P1 and the changes of I1 directly lead to the changes in P1. As a result, the computed BiS(1, 1, 3, 1) in (22) is a large value of 0.8551 and 1.72 for 10% and 20% percent Gaussian multiplicative noises, respectively. Here, the values in heat-map indicate the relative magnitudes of connectivities, which means we cannot compare the values in different heatmaps. Note that normalization can be implemented on the final calculated connectivities, if necessary.
Conversely, the network outputs for P2, P3, and P4 are   slightly changed with regard to the perturbations on I1, as shown in Figs. 8, 9, and 10. This is because the network has learned the low dependency of the pairs I1-P2, I1-P3, and I1-P4, which means I1 has little contributions to P2, P3, and P4. Hence, even if we add 20% percent Gaussian noise to I1, the obtained outputs of P2, P3, and P4 are still not perturbed much in the learned model. This leads to the low values of BiS(1, m, 3, n) (m = 1, 2, · · · , M and n = 1, 2, · · · , N ) in (22), as shown in the figures. Next we perturb I3 also with respect 10% and 20% percent Gaussian multiplicative noises, as shown in Fig. 11, the graphs    demonstrate consistent effects for the corresponding network outputs of the producers. Fig. 12 shows the original output of P1, the perturbed P1 with 10% percent noise, and the perturbed P1 with 20% percent noise, respectively. The three curves almost coincided together. This indicates that I3 contributes nearly no energy to P1 and hence, I3-P1 is judged to be a very low connected pair. Consequently, the calculated connection strength is significantly low having a value of 0.01133 and 0.02338 in the two heat-maps, respectively. Both values can even be ignored with comparison to the connectivity of I1-P1.   Similar results can be observed for P2 and P3 as shown in Figs. 13 and 14, respectively. However, the obtained outputs of P4 according to the perturbations change a lot to the original output of P4 as shown in Fig. 15. This is due to the learned information in the training patterns that I3 has a high correlation (connectivity) to P4. Consequently, the computed BiS is relatively large as 0.3873 and 0.7564 respectively with regard to the two perturbations.

C. Case Study 2: Braided River
Braided River is a complex scenario with several predominant pathways of water-flooding, 5 injectors (I1, I2, I3, I4, and I5) and 4 producers (P1, P2, P3, and P4). It is a conventional fluvial deposition widely distributed in the continental facies basin. The permeability of the river channel is very high yet those of the other areas are quite low. This case is composed of 100×100 single-layer grids on the horizontal plane, each with a size of 80f t × 80f t × 12f t. The initial oil saturation is set to 0.7 and the porosity is is set to 0.18. In this case, the permeability of the river channels is set to 1000 md while that of other areas is set to 50 md. As shown in Fig. 16, I1 and P1 are located in the river channel on the top left. P2, P3, P4 are situated in three different channels respectively and I5 is located in the channel on the right bottom part. Fig. 17 shows the results of the proposed BiSA method for the Braided River scenario with multiplicative 10% Gaussian noise perturbation on the water injection rates, where a darker color reflects a stronger connectivity. From the heat-map, the highly connected well pairs can be clearly observed. The highest one is I1-P1 with the connectivity value of 16.4. This is exactly the reflection of the channel-connected situation of I1 and P1 in the Braided River scenario. For pairs I5-P2, I5-P3, and I3-P4, the obtained connectivities are also higher with values 12.16, 8.367, and 6.057, respectively. They are also consistent with the actual situation in Fig. 17, where P2, P3, and P4 are well connected to I5 by three individual channels and I5-P2 is the highest among them. Nevertheless, the connectivities are quite low for the other pairs especially for I2, I3, and I4, caused by that all of these injectors are located in the low permeability area far from the river channels. Fig. 18 shows the results of the proposed BiSA method with addictive 20% Gaussian noise perturbation on the water injection rates of the Streak Case scenario. With stronger perturbation, the relative magnitude for each injectorproducer pair are suitably represented as well. The four highly connected pairs, namely I1-P1, I5-P2, I5-P3, and I5-P4, are prominently characterized by the proposed methods. All these results are consistent with the real situation in Braided River scenario. These results clearly demonstrate the utility of the proposed methods.

D. Comparisons with the current methods
Comparisons with the latest works in [55] and [52] are conducted to further validate the effectiveness of BiSA. Table  I demonstrates the comparison results by the ANN method in [55], the capacitance resistance model (CRM) in [52], BiSA with 10% Gaussian noise, and BiSA with 20% Gaussian noise on the Streak Case scenario in Fig. 3. Although the original values without normalization for BiSA provide intuitive description, the used results of the proposed BiSA method in Table I are normalized to range [0,1] to make an explicit contrast.
From the table, both ANN and CRM methods can identify the two streaks of high permeability, however, the real characteristics are not appropriately depicted. For CRM method, the obtained connectivity of I3-P4 (0.974) is fairly close to that of I1-P1 (1.0). This is not a proper reflection of the fact that I1-P1 has about double permeability compared to I3-P4. Similarly, for the ANN method, both I1-P1 and I3-P4 have very high and similar values and more surprisingly I1-P1 exhibits a weather connection than I3-P4! Besides, the characterizations for some other inter-well pairs are not accurate as well, as shown by the values marked in red. From the description of the Streak Case, we know that except for the pairs I1-P1 and I3-P4, connectivities for the other injector-producer pairs are practically non-existence. However, the acquired connectivities for I3-P3, I4-P2, I4-P4, I5-P2, I5-P3,I5-P4 by the ANN method and those for I2-P1, I2-P4, I4-P4, I5-P4 by the CRM method are noticeably high. These results can mislead the operations in real production. For the ANN method, for instance, the obtained connectivity of I4-P2 is 0.580, which makes an illusion that this pair has a reasonably strong connection between them. However, both the two streaks and the other injector-producer pairs are appropriately characterized by the proposed BiSA method. From Table I, we can see that for the proposed method, the connectivity of I1-P1 is the most dominant indicating a high transmissibility path while I3-P4 is the second one having almost the half of the connectivity value of I1-P1. Also, the obtained values for other pairs are very close to zero. This reflects the real situation as in the Streak Case scenario. Moreover, the proposed BiSA method is more stable and trustable than ANN method in [55], which is greatly affected by the initial weights. Finally, the BiSA method is model-independent and quite efficient in computation, compared to the other two methods. All these results demonstrate the effectiveness of the proposed BiSA method.
In reservoir engineering, connectivity diagram is a commonly used tool to exhibit the inter-well connectivity. The results of BiSA can be easily changed into a diagram as shown in Fig. 19, where the left one shows the results of ANN on the Streak Case scenario and the right one shows the connectivity diagram for BiSA with 20% Gaussian noise. The performance for these two methods can be easily compared from these views. If a policymaker develops the injectionproduction scheme depending on the left connectivity diagram, he or she may make unreasonable decisions (undesirable exploration) especially for I4 and I5. Because this diagram shows superior channel connected to both injections. Conversely, the right diagram for BiSA is clear and easy to understand. The policymaker can gain a more accurate understanding of the connectivity in this field. These results further demonstrate the effectiveness of the proposed BiSA method.
V. CONCLUSION In this paper, a stochastic sensitivity analysis method for (deep) neural networks, Bilateral Sensitivity, is proposed to measure the relationship between layers and neurons. Both the Bilateral Sensitivity between any layer pair and the Bilateral Sensitivity between any neuron pair of different layers in deep neural networks are defined. Then the Bilateral Sensitivity from an input to an output of a multi-layer neural network is easily obtained to infer the connections between the inputs and outputs. The proposed method is calculated in a highly efficient way. As long as the function expression between the input parameters and the output variables is given, namely the network is trained, both the Bilateral Sensitivity between layers and the Bilateral Sensitivity of each output variable with respect to different inputs can be effectively analyzed.
Based on this, the proposed Bilateral Sensitivity of neural network is applied to characterize the well connectivity in reservoir engineering. Given a trained network by Water Injection Rates (WIRs) and Liquid Production Rates (LPRs) data, the well connectivity can be efficiently characterized by the measure of Bilateral Sensitivity. The empirical results verify the effectiveness of the proposed method and the comparisons with some state-of-the-art methods demonstrate its superior performance. Besides, the proposed Bilateral Sensitivity is independent of the model and is easy to be implemented on other neural networks such as RBFs and CNNs.