Multi-Step Predictions for Adaptive Sampling using Proximal ADMM

—The paper presents a novel approach by using multi- step predictions to address the adaptive sampling problem in a resources and obstacles constrained mobile robotic sensor network to efﬁciently monitor environmental spatial phenomena. It is ﬁrst proposed to employ the Gaussian process (GP) to represent the spatial ﬁeld, which can then be used to predict the ﬁeld at unmeasured locations. The adaptive sampling problem aims to drive the mobile sensors to optimally navigate the environment where the sensors adaptively take measurements of the spatial phenomena at each sampling step. To this end, a conditional entropy based optimality criterion is proposed, which aims to minimize prediction uncertainty of the GP model. By predicting possible measurements the mobile sensors potentially take in a horizon of multiple sampling steps ahead and exploiting the chain rule of the conditional entropy, a multi-step predictions based adaptive sampling optimization problem is formulated. The objective of the optimization problem is to ﬁnd the optimal sampling paths for the mobile sensors in multiple sampling steps ahead, which then provides their beneﬁts in terms of better navigation, deployment and data collection with more informative sensor readings. However, the optimization problem is nonconvex, complex, constrained and mixed-integer. Therefore, it is proposed to employ the proximal alternating direction method of multipli- ers algorithm to efﬁciently solve the problem. More importantly, the solution obtained by the proposed approach is theoretically guaranteed to be converged to a stationary value. Effectiveness of the proposed algorithm was extensively validated by the real- world dataset, where the obtained results are highly promising.

extensively adopted in environmental monitoring applications [6]. A GP model can effectively learn a spatial phenomenon from a limited number of observations and then gives an estimate of the spatial field and corresponding uncertainty at any positions of interest. However, to optimize operations of a MRSN in practice, the mobile sensors are expected to take measurements at locations in which the information gained by the sensor measurements is maximized while the prediction uncertainty is minimized. This fundamental problem is widely known as adaptive sampling [7]. An efficient adaptive sampling strategy may result in reducing numbers of deployed sensors and measurements needed for learning an accurate model. Given resource constraints in the network including its limited capability of communication, computation, memory and power, and constraints on robot mobility including collision avoidance and actuator limitation, formulating and solving the adaptive sampling problem are generally not straightforward. Therefore, research on the adaptive sampling problem has been increasingly received attention in recent works. Reviews on the adaptive sampling problem for a MRSN can be found in [8], [9] while a brief summary on some recent studies relating to our work is given in Section II of this paper.
To the best of our knowledge, all the existing works considering the adaptive sampling in a MRSN have addressed the problem by employing single-step predictions at only one future time step. That is, by predicting potential measurements the mobile robotic sensors can take in the next step, the sampling strategy can decide where the robots should move to. In contrast, in this work, we propose a novel approach for addressing the adaptive sampling problem by using multi-step predictions at multiple time steps ahead. Unlike the previous works which aim to only find the most informative locations for the mobile sensors at the next sampling step, in our approach, the network seeks the optimal sampling paths for a horizon of multiple time steps in advance, which leads to better navigation and deployment of mobile sensors, and then better data collection with more informative measurements. Once the adaptive sampling problem is solved, the robots move to the locations at the next time step to take new measurements. This procedure is repeated at every sampling time instant. Note that the adaptive sampling objective is proved to be equivalent to a maximization of the conditional entropy of the spatial field at the sampling locations of all robotic agents over the prediction horizon, which can be represented by the log determinant of the GP posterior covariance matrix at these locations. Moreover, to guarantee avoidance of physical inter-robot and robot-obstacle collisions at each time step in the horizon, we propose to employ a scheme based on the mixed-integer programming (MIP) [10]. The resulting optimization problem with mixed binary variables is thus nonconvex and complex, where the log determinant of the GP posterior covariance matrix in the objective function may cause computational intractability. As a result, similar to our previous work [11], to effectively solve the adaptive sampling optimization problem, we propose to use the proximal alternating direction method of multipliers (proximal ADMM) algorithm [12], [13]. More importantly, in convergence analysis we prove that by utilizing the proximal ADMM algorithm a stationary solution of the adaptive sampling problem can be theoretically obtained if the covariance function of the GP model representing the spatial field is the squared exponential (SE) kernel. Numerical simulations based on the real-world dataset were extensively conducted to verify the proposed approach where the obtained results demonstrate that exploiting the long-term planning can enhance accuracy of the predicted fields. In summary, the paper has threefold contributions as follows.
1) A new adaptive sampling optimization problem for a resources constrained MRSN is formulated in which the objective function is derived from multi-step predictions given collision avoidance constraints presented by the MIP. 2) An efficient proximal ADMM based approach is proposed to effectively solve the nonconvex, complex, constrained and mixed-integer adaptive sampling optimization problem. 3) Convergence analysis of the proposed algorithm is discussed where it theoretically proves guarantee of the stationary solutions in cases the SE kernel is used in the GP model. The remaining of this paper is arranged as follows. The related works are summarized in Section II. Section III introduces a new adaptive sampling approach for a MRSN to efficiently monitor a spatial field using the multi-step lookingahead predictions. Section IV presents a method to solve the adaptive sampling optimization problem by using the proximal ADMM optimization algorithm while convergence analysis is discussed in Section V. Effectiveness of the proposed approach is demonstrated in Section VI before conclusions are drawn in Section VII.

II. RELATED WORKS
In literature, there are various works pertaining to the adaptive sampling in a MRSN for the purpose of monitoring environmental phenomena. For instance, the authors in [14] presented a method to partition a MRSN into several small groups in which each group is responsible for finding its optimal sampling locations. Likewise, Tiwari et al., [15] considered a decentralized multi-robot system where each robot is allocated in a local sensing zone for the monitoring purpose. By exploiting Voronoi concepts, the work [16] proposes a method that allows each sensing agent to conduct the adaptive sampling tasks within its dynamic cell area. Some other works [17], [18] use a flocking control approach to form a network of mobile sensors that track a virtual leader, leading to a path planning problem for a single agent.
However, majority of the existing works impose the mobile sensors to jointly find their optimal sampling locations in the entire region of interest. In those works, a metric of predictions (e.g., predicted variances) is usually used to formulate the optimization problem. For instance, the authors in [19] proposed that sensing agents can determine their next sampling locations by maximizing their predictive variances. In [20], the maximum a posterior estimation technique is employed to design an adaptive sampling strategy that aims to minimize prediction variances. In a similar fashion, an adaptive sampling minimization problem was developed in [21] where the cost function is calculated by an average of predictive variances over a set of pre-specified target points. Xu et al., [22] later designed a strategy to minimize both prediction errors and uncertainty in hyperparameters simultaneously. Nevertheless, the work [23], at a different angle, formalizes an active sensing criterion that trades off between gathering the most informative observations and predicting the phenomenon given the current but possibly inaccurate estimate of the correlation structure. Also considering balance between exploration and exploitation in a Bayesian optimization problem, Marchant et al., in [24] exploited traveled distances of mobile robots to derive a sampling strategy. The authors in [25] derived a constrained dynamical optimization problem with two objectives that correspondingly locate peaks of environmental data and guarantee robot coordination. In recent works, while a reinforcement learning based method is proposed in [26] to minimize the log volume of a confidence ellipsoid representing an estimation error, both the mean value for exploitation and the variance estimate for exploration given collision avoidance constraints are maximized as presented in [27]. Furthermore, the multirobot coverage control was taken into account in parallel with the sampling design [28], [29].
Another metric having been utilized in deriving the adaptive sampling optimization problem is relied on the information criteria including Fisher information matrix, entropy and mutual information [30]- [32]. In [33], the upper confidence bound of the GP is adopted in an optimization criterion and the crossentropy method is used to optimize the sampling paths. In our previous works [34], [35] Gaussian Markov random field is exploited to represent a spatial field on an irregular discrete lattice and the adaptive sampling strategies are designed based on mutual information and conditional entropy, respectively. In the other works [36], [37], we show that a grid-based greedy method can be employed to solve the adaptive sampling problems where submodular properties are exploited to prove the solution bounds. However, we have recently found that in a continuous domain solutions of the adaptive sampling problem can be better obtained. More specifically, in [11] we proposed to solve the problem by the proximal ADMM that exploits the proximal operator to deal with the complex adaptive sampling objective function and the constraints on robot mobility separately. The results in [11] verify outperformance of the proximal ADMM method as compared with the grid-based greedy algorithm. We then extended the problem to the network considering non-holonomic dynamics of the robots, where the obtained results are highly promising [38]. Nevertheless, in all the previous works, the mobile sensors only exploit predictions at the next sampling step to decide where to move to, which we call the single-step prediction method. In the following sections, we will discuss the novel multiple-step prediction approach for the adaptive sampling.

III. MULTI-STEP PREDICTION APPROACH FOR ADAPTIVE SAMPLING OPTIMIZATION
This section introduces a new approach based on multi-step predictions for the adaptive sampling problem of a MRSN in environmental monitoring. At each sampling instant, the formulated optimization problem aims to seek the most informative sampling paths of the MRSN in a prediction horizon of H next time steps, which are then employed to decide where the mobile sensors should move to in the next sampling step.

A. Adaptive Sampling with Multi-Step Predictions
Consider a network of M mobile robots, where each robot is equipped with a sensor to take measurements of a spatial field. We define V = {1, . . . , M } as a set of robot indexes. We denote s i,t = [s x,i,t , s y,i,t ] T as the location vector of robot i at time step t and assume that the locations of mobile robots are in a compact 2D Euclidean denote the matrix of all robotic sensor locations at time step t, while s t1:t2 = [s k ] k=t1,...,t2 ∈ R 2×(t2−t1+1)M are defined as its collective matrix from time step t 1 to t 2 . We constraint the distance travelled by each robot at two consecutive sampling steps as At each time step, the mobile sensor i takes a noisy measurement y i,t of the spatial field of interest at its current location, which is modeled as where h : R 2 → R is a latent function representing the spatial phenomenon while w i is an independent and identically distributed Gaussian zero-mean noise. All the sensor measurements at time step t are denoted by y t = [y i,t ] i=1,...,M ∈ R M and the collective measurements from time step t 1 to time step In environmental monitoring, GP is commonly employed to learn a model of the latent spatial field from the collective sensor measurements. GP can derive a spatial relationship between sampled data by using a kernel (covariance function) and then performs Bayesian inference for predictions at unobserved locations. At time t, let G t denote the GP model trained on the data set D t = (s 0:t , y 0:t ), i.e., the locations and measurements taken from all mobile sensors up to time t. Statistically, G t is fully specified by a mean function m(s ; θ t ) and a kernel k(s , s ; θ t ) that are parameterized by the hyperparameters θ t . The hyperparameters θ t can be obtained by maximizing the likelihood [5]. At the time t + 1, define s t+1 = [s 1,t+1 , . . . , s M,t+1 ], the joint predictions at s t+1 of G t is a random variableŷ t+1 ∼ N µŷ t+1|y0:t , Σŷ t+1|y0:t , in which the posterior mean vector µŷ t+1|y0:t and the posterior covariance matrix Σŷ t+1|y0:t are computed as follows.
where σ 2 n > 0 is the Gaussian noise variance, and I is an identity matrix of appropriate dimensions. m t+1 and m 0:t are the prior mean vectors at s t+1 and s 0:t , respectively, where the elements m i of each vector is computed by Let us denote I t = t + 1 : t + H as a set of all time steps in the prediction horizon at the current time step t. For k ∈ I t , given the GP model G t and the predictive observationŝ y t+1:t+k−1 of G t at the locations s t+1:t+k−1 from time t + 1 to t + k − 1, we defineD t+k−1 = (s 0:t+k−1 ,ŷ 0:t+k−1 ) as a training dataset at future time step t + k − 1 where s 0:t+k−1 = [s 0:t , s t+1:t+k−1 ] andŷ 0:t+k−1 = [y 0:t ,ŷ t+1:t+k−1 ]. As a result, from (3), the posterior covariance matrix for the predictionŷ t+k at the sensor locations s t+k can be given by (4) Given the GP model available at the current time step, the next optimal locations for the mobile sensors can be obtained by maximizing the conditional entropy of the spatial field at the next sampling locations [36]. The conditional entropy can be computed by the log determinant of the GP predictive covariance matrix. Then the optimization problem to adaptively find the optimal locations in the next time step is formulated as follows.
However, given (4), the optimal sampling locations at each time step over the horizon of the next H sampling steps can also be calculated by In other words, the optimal locations in all the next H sampling steps, s * It , can be found by solving the following optimization problem.
By using the chain rule for the conditional entropy [39], the multi-step predictions based objective function in (7) can be simplified by k∈It − log det Σŷ t+k |ŷ 0:t+k−1 (s t+k ) The covariance matrix Σŷ I t |y0:t (s It ) is computed by Fig. 1: Illustration of a rectangle obstacle.
where K It,It ∈ R M H×M H is the covariance matrix at s It . K It ∈ R M H×M (t+1) is the cross-covariance matrix between s It and s 0:t .
For the simplicity purpose, we denote the matrix Σŷ I t |y0:t (s It ) by Σ(s) henceforth where s ∈ R 2M H denotes a vector that includes all variables in the matrix s It . Note that to avoid numerial issues in the implementation due to the singularity of the covariance matrix (see Lemma 1) and to guarantee convergence of the optimization algorithm, a term 2 I is added to the covariance matrix (9) where 2 is a small positive constant i.e., we solve the following problem.

Lemma 1:
The matrices Σ(s) + 2 I and K + σ 2 n I are invertible and their inverse matrices are bounded.
Proof: Since the matrix K is positive semi-definite (PSD) [5], for any eigenvalue λ i of K we have λ i ≥ 0 and det(λ i I − K) = 0. Then we have det (λ i + σ 2 n )I − (K + σ 2 n I) = 0, which means λ i + σ 2 n is an eigenvalue of K + σ 2 n I. Thus all eigenvalues of K + σ 2 n I are lower bounded by the positive constant σ 2 n . Therefore, its inverse matrix exists and is bounded.
Similarly, since Σ(s) is a PSD matrix [5], all eigenvalues of Σ(s) + 2 I are lower bounded by the positive constant 2 . Therefore, its inverse matrix exists and is bounded.

B. Collision Avoidance
To avoid robot-robot and robot-obstacle collisions, in this work we propose to employ the MIP scheme [10]. Firstly, the inter-robot collision between two agents i and j at every time step can be avoided by the following constraints where the time-step subscript t is omitted for brevity.
where d ij is a safety distance between the agents i and j. The above constraints can be represented by the AND logic statements as follows.
Secondly, regarding obstacle avoidance, we assume that obstacles are stationary and represented by rectangles as illustrated in Figure 1 that can be converted to the AND logic statements as follows.

C. Adaptive Sampling Optimization Problem
Given the adaptive sampling objective (8), the constraints on each robot movement (1), and the MIP-based in a MRSN using the multi-step predictions can be formulated by minimize s,c H(s) subject to s i,k ∈ Q, ∀i ∈ V, ∀k ∈ I t ∆s min ≤ s i,k − s i,k−1 ≤ ∆s max , ∀i ∈ V, ∀k ∈ I t (11) and (12), ∀i = j ∈ V, ∀k ∈ I t , where c denotes the concatenated vector of all binary variables over the horizon. However, to facilitate the convergence analysis, we redefine c as a real vector and constraint it in a nonconvex cone where only two elements {0, 1} exist. Moreover, we represent the nonconvex cone constraint and all the constraints in (13) by s, c ∈ C, leading to a simplified version of (13) as follows.
It can be seen that (14) is a mixed integer nonlinear programming problem, where complexity of the objective function involving the log determinant of a M H × M H covariance matrix can lead to computational intractability [11]. Therefore, in order to solve it effectively, we propose to exploit the proximal ADMM algorithm [12], [13], being discussed in the following section.
IV. PROXIMAL ADMM FOR MULTI-STEP PREDICTIONS BASED ADAPTIVE SAMPLING PROBLEM In our previous work [11], the proximal ADMM algorithm was proposed to solve the adaptive sampling problem with single-step predictions. We now extend it to efficiently address the multi-step predictions based problem.
Let x = [s T , c T ] T define the concatenated vector of optimization variables. We then define a matrix F that can extract s from the vector x. Hence, the objective function in (14) can be rewritten as f 0 (x) = H(Fx). If I C (·) denotes the indicator function of C, then the problem (14) can be presented in the two-block consensus form as follows. minimize x,z where z is a copy of x. The augmented Lagrangian function of the problem (15) is defined by where u is a vector of the associated dual variables and ρ is a regularization parameter. And the nonconvex conventional ADMM algorithm [40] can be utilized to solve the optimization problem (15) in the following steps at each iteration.
However, it can be seen that exactly solving the xminimization step (17b), which involves the log determinant of a covariance matrix, is computationally intractable. To deal with complexity of the function, the proximal ADMM method [12], [13] exploits its first-order approximation to better address the problem (15) as follows.
Algorithm 1 Proximal ADMM algorithm for multi-step predictions based adaptive sampling problem Require: x (0) , u (0) , res , k max , ρ, L 1: for k = 0, . . . , k max do 2: Compute z (k+1) by (18a) 3: Compute x (k+1) by (19) 4: Compute u (k+1) by (18c) 5: if z (k+1) − x (k+1) < res then 6: Stop and return x (k+1) where L is the Lipschitz constant of ∇f 0 (·). It is important to note that the z-minimization step (18a) is a mixed-integer quadratic program (MIQP) problem and can be solved efficiently by any MIQP solvers, whereas the x-minimization step (18b) is an unconstrained convex quadratic minimization that can be analytically solved by The term ∇f 0 (z (k+1) ) in (19) is computed as follows. The gradient of f 0 (·) with respect to c is vanished, while the derivative of f 0 (·) with respect to each element s j of the vector s can be computed by where from (9) we have Steps of the proximal ADMM algorithm for the multi-step predictions based adaptive sampling problem are summarized in Algorithm 1. The algorithm is terminated if the norm of dual errors, i.e., z (k+1) − x (k+1) , is less than or equal to a predefined threshold res or a maximum number of iterations k max is reached. Note that at each iteration the proximal ADMM algorithm involves only one optimization problem in form of a MIQP problem, which makes the implementation more tractable as compared with directly solving the original mixedinteger nonlinear programming problem with a highly complex objective function in (14).

V. CONVERGENCE ANALYSIS
This section presents convergence analysis of the proposed algorithm. In other words, we will show that if the widelyused SE covariance function is employed in the GP model, two following conditions are satisfied: (1) f 0 (·) has a Lipschitz continuous gradient, and (2) I C (·) is lower semi-continuous. Hence, the solution obtained by the proposed algorithm can converge to a stationary value.
Assumption 1: The SE kernel is used in the GP model G t . The SE kernel function is given by where Λ = diag(λ 2 1 , λ 2 2 ) is a diagonal matrix of the lengthscales and σ 2 f is the variance of the kernel. It is noticed that due to complexity of mathematical derivations in a general case, in this work we take the SE kernel as an example for analyzing convergence of the proposed algorithm. Nonetheless, the proposed approach as presented in Sections III and IV can be applied to a GP model with any covariance function. We now derive some properties related to the SE kernel. 2) ∂k(s , s )/∂s j and ∂k(s , s )/∂s j , for j = x, y.
Proof: We first present explicit forms of the functions in parts 2, 3 and 4 of Lemma 2. It is noted that k(s , s ) is given in (22). In part 2, In part 3, if i = j, And if i = j, In part 4, if i = j, And if i = j, It is notice that if a function is presented in the form it is bounded for any fixed α > 0 and n ∈ N. Therefore, given the derivations in (22), (23), (24), (25), (26) and (27), all the functions stated in Lemma 2 are bounded. Since the explicit form of ∂ 2 Σ(s)/∂s i ∂s j can be specified by (28) Given the results in Lemma 2, it can easily derive that ∂ 2 Σ(s)/∂s i ∂s j is bounded.
Lemma 4: The function f 0 (·) has a Lipschitz continuous gradient with a Lipschitz constant L.
Proof: Since the gradient of f 0 (·) with respect to c is vanished, we first prove that H(s) has a bounded Hessian matrix ∇ 2 H(s). From (20), each entry of the Hessian matrix is given as follows.
Proof: As C is the intersection of a closed discrete set, the compact set Q and some halfspaces, C is a closed set. Therefore, I C (·) is lower semi-continuous since it is an indicator function of a closed set.
Given Lemmas 4 and 5, we are now able to derive a convergence property for the proposed algorithm if the following assumption on the parameter of the proximal ADMM holds.
Note that the obtained stationary point is not globally unique unless the augmented Lagrangian L (16) is a Kurdyka-Lojasiewicz (KL) function [12].

A. Simulation Setup
The simulations were executed on a DELL computer with a 3.0 GHz Intel Core i5 CPU and 8 Gb RAM, where the algorithms were implemented in the Julia programming language while the JuMP package [42] and the Gurobi optimizer [43] were used to construct and solve the MIQP problems. In order to generate sensor measurements and then verify predictions, a simulated environmental phenomenon was created based on the realistic dataset [41]. In other words, a ground truth GP model learned from 54 indoor temperature observations collected by 54 sensors deployed in the Intel Berkeley Research Lab at a specific time instant was utilized to generate the 'virtual' measurements where the mobile sensors are supposed to take measurements and the spatial field data where it is compared with the predictions. The environmental field for the entire map of size 40 m-by-30 m generated by the ground truth GP model is depicted in Figure 2.
In the simulations, a network of 5 robotic sensors was employed to monitor the indoor temperature phenomenon in the region of interest: Q := {(x, y) | 0 m ≤ x ≤ 40 m, 0 m ≤ y ≤ 30 m}. The minimum and maximum distances travelled by each mobile sensor between two sampling steps were set as ∆s min = [−2, −2] T (m) and ∆s max = [2, 2] T (m), respectively. The parameters of the proximal ADMM algorithm were selected as ρ = 1, L = 0.1, res = 10 −3 , and k max = 100. In each experiment, the mobile sensors were set to conduct 15 sampling steps; that is, a total of 80 measurements were gathered at the end of the data collection in the network. Since in practice the mobile robots can start from any random initial locations in the environment, the starting locations for the mobile sensors for each simulation were chosen randomly but must be feasible in the environment. Thus, for a comprehensive evaluation of our proposed approach, we conducted 1000 experiments where two stationary obstacles were positioned in the environment as represented by gray rectangles in Fig. 4. Three scenarios with different prediction horizons of H = 1 (for the single-step prediction based method), H = 3 and H = 5 (for the multi-step predictions based algorithm) were considered in the validation process.

B. Results and Discussions
The convergence properties of the proximal ADMM algorithm in the first time step of a specific simulation, including the norm of the residuals (k) = z (k) − x (k) , the objective value f 0 (x (k) ), and an element of the dual variable u (k) , are illustrated in Figures 3a, 3b and 3c, respectively. The demonstrated results show that the proposed algorithm can guarantee convergence of the residual, objective and dual.
To demonstrate performance of two single-step and multistep predictions based approaches, we summarized the example results in Figures 4 and 5, respectively. For instance, at one particular experiment, the sampling paths for the mobile sensors in two methods with H = 5 and H = 1 after 15 sampling steps are illustrated in Figures 4c and 4f, respectively. Moreover, the indoor temperature fields in the whole environment predicted by the collective sensor measurements up to time steps t = 5, t = 10 and t = 15, obtained in two approaches, are depicted in Fig. 5. It can be seen that the more measurements are gathered, the more comparable prediction results as compared with the ground truth shown in Fig. 2 are obtained. The corresponding predicted variances are also demonstrated as background in Fig. 4. It is noted that both the multi-step and single-step predictions based adaptive sampling strategies can drive the robotic sensors to explore areas associated with a higher level of uncertainty while preventing collisions among the robots as well as between the robots and the stationary obstacles. Nevertheless, since the multistep predictions based approach can predict possible sensor measurements multiple sampling steps ahead, it efficiently drive the mobile sensors to better navigate the environment, which leads to the more accurate prediction results. In other words, over time the prediction uncertainties obtained by the multi-step predictions based strategy are significantly reduced as compared with those obtained by the single-step predictions based scheme, as can be seen in Fig. 4.
To further validate benefits of using multi-step predictions in the proposed approach, we examined 3 different validation metrics including the maximum variances (MVs), the root mean squared errors (RMSEs) and the maximum absolute errors (MAEs) for 3 horizon lengths H = 1, 3, 5. The results in one example experiment computed over 15 sampling steps are demonstrated in Fig. 6, where the multi-step lookingahead predictions can result in significant reduction of the prediction errors. It can be seen that once the amount of the measurements is adequately collected, e.g. after 11 time steps, the longer length of the prediction horizon is, the more accurate the predicted results are. For instance, at the 15 th time step, we summarized 3 computed validation metrics in all 1000 simulations and presented them in the box plots as shown in Fig. 7. It can be clearly seen that the multi-step looking-ahead predictions in the proposed method can considerably lead to the better predicted results.
The computation time for the different horizon lengths at each time step over the 1000 experiments was also averaged as shown in Fig. 8. Unsurprisingly, the computation time is proportional to the length of the prediction horizon. More specifically, at each time step on average the single-step     prediction approach takes about 0.6 s while the multi-step prediction approaches with H = 3 and H = 5 take approximate 1.4 s and 2.0 s, respectively. Therefore, in practice, depending applications, the horizon length can be chosen appropriately to balance between the prediction result accuracy and the computation time.
VII. CONCLUSIONS The paper has discussed a novel but efficient method by exploiting the multi-step looking-ahead predictions for the adaptive sampling in a MRSN monitoring spatial fields. By employing the GP model to represent the spatial phenomenon, the possible sensor measurements can be predicted at multiple sampling steps ahead. That is, the adaptive sampling optimization problem, which is derived from the conditional entropy, is formulated to find the optimal sampling paths for the mobile sensors at multiple time steps in advance. Exploitation of the multi-step looking-ahead predictions in the sampling strategy can provide a resources and obstacles constrained MRSN with significant benefits including optimal navigation and more informative data collection. Though the optimization problem is nonconvex, complex, constrained and mixed-integer, it can be efficiently solved by the proximal ADMM. More importantly, in a common scenario where the SE covariance function is utilized in the GP model, it is theoretically proved that the solution obtained by the proposed algorithm can converge to a stationary point. The extensive evaluation conducted in a synthetic environment using the realworld dataset has verified efficacy of the proposed approach.