Exploiting Variational Inequalities for Generalized Change Detection on Graphs

This article introduces a unified framework for developing graph-based change detection (CD) algorithms in remote sensing, which is based on signal feasibility problems and variational inequalities. We argue that signal feasibility problems provide a natural way to frame the CD problem, while variational inequalities, core elements of modern data science and signal processing methods, enable us to find efficient, stable, and reliable solutions to the proposed feasibility problems. We demonstrate the design of both semisupervised and unsupervised CD schemes from our perspective, establishing connections with graph Laplacian filtering and graph convolutional networks (GCNs). In contrast to specialized methods that rely on composite objective functions with multiple penalty parameters, our approach greatly simplifies hyperparameter selection, as the hyperparameters are both bounded and can form convex combinations (i.e., they are nonnegative and sum up to one). We evaluate our approach on various real heterogeneous and homogeneous datasets, demonstrating its capabilities compared to traditional and modern CD methods. In addition, our ablation studies confirm the consistency of our solutions under variations in the number of nodes and graph structure learning (GSL) methods. We conclude by discussing the advantages, limitations, and promising future research directions, with connections to graph filtering, sampling set selection, and self-supervised learning. The source code to replicate the experiments and explore the approach further is available on GitHub at https://github.com/jfflorez/Exploiting-variational-inequalities-for-generalized-change-detection-on-graphs.git.

development.The goal of CD is to construct a map of a certain area that indicates changes in land cover and land use, by analyzing a pair of (multichannel) satellite images of the same area taken at different times.A simple and sometimes effective approach to CD is to leverage direct image subtraction and analyze the image difference, commonly referred to as difference image [1].However, in most practical settings under different meteorological conditions, satellite imagery comes from heterogeneous data sources.As a result, the difference image may be contaminated by other sources of variation and no longer accurately capture changes on the earth's surface [2].
To deal with heterogeneous data, traditional statistical methods often characterize the joint distribution of the pixel space considering assumptions on the sensing process and statistical properties of the noise [3], [4].For instance, Prendes et al. [5] models the joint distribution as a parametric mixture model, and designs a statistical hypothesis test on the parameter space to decide whether or not a small square region on the scene has changed [5].Since the (often stationary) assumptions may not hold accurately in practice, statistical methods like this may have difficulty adapting to complex feature spaces.
Accordingly, contemporary approaches (e.g., [6], [7]) aim to reduce the heterogeneous CD problem into a more homogeneous one.The key idea is to learn a mapping that takes a pre-event image in a certain domain (e.g., a class of spectral images) into the post-event image's domain (e.g., a class of single aperture radar images).In this manner, we can generate comparable images and thus a more accurate difference image, where traditional homogeneous CD methods can be used.Inspired by this (image translation) paradigm, Luppino et al. [8] propose a promising CD approach based on cycle generative adversarial networks and convolutional autoencoders [8].Putting aside large training sample complexity, this approach requires also careful hyperparameter selection, which may limit their practical use.This is because the adversarial loss functions used for training can be unstable and oscillatory, making hyperparameter selection problematic [9], [10].This has prompted research to stabilize training using different regularization mechanisms [9], where broadly speaking, regularization terms are introduced to encourage latent feature vectors to satisfy pre-specified pairwise affinities.
The concept of pairwise affinity is noteworthy because state-of-the-art graph-based methods explicitly utilize pairwise affinities or, more generally, pairwise relationships for change detection [2].Recent research in this field has shown that the signal of interest, whether it is a difference image or a change map, exhibits smoothness properties with respect to the structure of a graph inferred from satellite imagery.Consequently, these properties can be effectively employed in the design of stable and accurate heterogeneous change detection methods [2], [11], [12], [13], [14], [15].
In this paper, the goal is to build on these successes and particularly exploit the concept of graph smoothness prior model from a different perspective, which is based on signal feasibility problems and variational inequalities.Figure 1 summarizes the workflow and key components of our method.On the one hand, we argue that signal feasibility problems provide a natural way to express various sources of prior knowledge and regularity constraints that are useful for change detection.To justify this idea, we show that at least two graph-based change detection schemes that operate in semisupervised and unsupervised regimes can be formulated as signal feasibility problems.On the other hand, variational inequalities are capable of describing general notions of equilibrium that go beyond traditional optimization techniques [16].This is of utmost importance in our setting since, as will be demonstrated, the solution set of the problems of interest can be accurately described by variational inequalities.In turn, our solutions can be found efficiently using provably convergent block iterative fixed point theory algorithms [16], [17].
Unlike traditional methods that rely on composite objective functions with several unrestricted penalty parameters, when we formulate the CD problem as a feasibility problem, its solution depends on a surrogate problem (i.e., a suitable relaxation) with restricted penalty parameters that are non-negative and sum up to one.Clearly, this facilitates hyperparameter tunning.As demonstrated by [17], the surrogate problem is a variational inequality that can handle possibly inconsistent feasibility problems.This turns out to be an interesting feature since it may be that our knowledge and data-driven constraints are to some extent inconsistent, or alternatively, it may be impossible to satisfy them simultaneously.In such a case, the surrogate problem enables us to find solutions that satisfy those constraints in an approximate sense [17].As will be clarified, the obtained solutions are reliable in the sense that they satisfy desirable properties such as global optimality, efficiency, and stability.This is because they result from solving an underlying convex problem with first-order type methods (e.g., gradient descent type methods), which are generally efficient and stable for large-scale optimization problems.
Interestingly, by taking this approach, we are also able to shed light on the significance of graph filtering in generating accurate change maps.This enables us to leverage more general notions of graph smoothness, including those utilized in graph convolutional networks (GCNs).To demonstrate that, we harness the capabilities of untrained or randomly initialized GCNs to construct spaces of smooth graph signals, establishing potential connections with compressive spectral clustering.This also raises intriguing questions about the suitable training of GCNs, or more generally graph neural networks, using selfsupervision to potentially enhance performance in our context.
According to the terminology used by Teodoro et al. [18], the proposed method can be categorized as scene-adapted.On the contrary, deep learning methods based on end-toend learning (e.g., [19], [20] and references therein) can be regarded as class-adapted.Unlike scene-adapted methods that rely only on a single image pair of the scene of interest, class-adapted methods are a bit more ambitious in that their goal is to learn hierarchical features that explain changes in a whole image class such as urban changes [21], [19], or landslides [22] to name a few.Due to limited training datasets, class-adapted methods may still struggle to perform well on examples outside the training set and may not generalize across different image classes.A crucial step towards fullyautomated change detection is thus to generate more diverse annotated image pairs.Unfortunately, the task is inherently difficult, and most human annotators need to be trained, which can lead to inconsistent labeling [21].Such inconsistencies may harm the overall performance of supervised learning.Our paper thus advocates for the development of graph-based methods, not to prevent the use of deep learning approaches, but because they can reduce labeling complexity and provide clear guidelines to minimize inconsistencies among different human annotators [23].
The remainder of the paper is as follows.Section II lays out graph theoretical concepts and states the problems of label propagation and signal decomposition that encompass the fundamental principles of several change detection methods.Section III expresses such problems as feasibility problems, leading to the proposed approaches to semisupervised and unsupervised CD.Section IV demonstrates the proposed signal feasibility problems are part of a general class of problems and elaborates on how to use variational inequalities to solve such problems.Here we also explain important properties the components need to satisfy and state an Algorithm to solve the problem.Section V adapts the concepts to the context of change detection and Section VI performs a series of experiments to test the capabilities of our approach.Section VII discusses advantages, limitations, avenues for future research, and concludes the work.

STATEMENT
We now review important graph signal processing (GSP) concepts [24] and then state two important problems that encompass particular instances of the problem of change detection.
A graph G = (V, E, w) is a tuple that consists of a vertex set V = {1, . . ., n}, an edge set E ⊂ V × V , and a nonnegative weight function w : E → [0, ∞) such that for a pair (i, j) / ∈ E, w(i, j) = 0 and for (i, j) ∈ E, w(i, j) = w(j, i).For algebraic manipulation, a graph G is often represented in terms of an n-by-n adjacency matrix W G = (W ij ) 1≤i,j≤n with entries W ij defined by the weight function as W ij = w(i, j).Equally important, a graph G can be associated with the graph Laplacian matrix written in terms of the adjacency matrix as: where D G is an n-by-n diagonal matrix with diagonal entries given by the degree of each node i, defined as j∈V w(i, j) for i ∈ V .Since the graph Laplacian L G is symmetric positive semidefinite, it can be factored as [25]: where U ∈ R n×n is an orthogonal matrix, i.e., U T U = I n and Λ = diag(λ 1 , . . ., λ n ) is a diagonal matrix, where the λ i 's are the eigenvalues of L G and satisfy 0 = λ 1 ≤ . . .≤ λ n .
The graph Laplacian plays an instrumental role in defining the smoothness of a signal defined on the nodes of the graph G.A signal x on G denotes a function from V into the real numbers and can be represented as an n-dimensional vector in R n , i.e., x = (x 1 , . . ., x n ) T .The smoothness of x with respect to G is given by the graph Laplacian quadratic form x → x T L G x, also known as the Dirichlet energy.A graph G is said to be connected if for any two vertices a, b ∈ V , there is a sequence of vertices {v i } k i=0 , k ≤ n or path connecting a and b.That is, {v i } k i=0 is such that connects a and b, that is x 0 = a and x k = b.

A. Label propagation
Label propagation (LP), which is one of the most wellknown semi-supervised learning approaches exploits the assumption that the label function is sufficiently smooth on the graph, and therefore we can estimate such a labeling function from a few labeled nodes.More formally, an unknown label function f : V → {−1, 1}, which happens to be sufficiently smooth on a connected graph G, can be accurately estimated from a few samples at S ⊂ V by solving and setting the estimate f of f such that f (i) = 1 for x i > 0 and −1 otherwise for all i ∈ V .As aforementioned, x → x T L G x measures the smoothness of x with respect to G. The problem (3) thus searches the smoothest signal x on G that complies with the samples of f at S. From a GSP perspective, the minimum number of labeled samples required for accurate estimates depends on the bandwidth of the signal of interest on the graph and the choice of S [26].In our experiments, we use a form of stratified random sampling to construct S. However, sampling set selection strategies that are based on GSP principles, e.g., [24], [27], can also be adopted, perhaps leading to sample efficiency and reconstruction accuracy gains.

B. Signal decomposition
We now consider a setting where a set of signals x 1 , x 2 , δ on V satisfy and we would like to find x 1 and δ provided that we have access to x 2 .This problem is clearly underdetermined, and therefore we need prior knowledge of x 1 and δ to find accurate estimates of them.Using graph smoothness assumptions, we can better condition the problem.Particularly, provided graphs G 1 and G on which x 1 and δ are the smoothest and sufficiently smooth respectively, we can estimate x 1 and δ as follows: Since we assume that δ is sufficiently smooth, we control the degree of smoothness by varying the radius ϵ of the zero-centered Euclidean ball B ℓ2 0,ϵ .The smaller the radius, the smoother δ on G.In the context of CD, the works [28], [7], [12], [13], [2] have developed state-of-the-art methods that rely on these modeling assumptions.Here x 1 and x 2 are associated with pre-event and post-event signals and δ with their difference signal.We note that (5) may not be strictly convex so it may not have a unique solution.In our experiments, we always assume G 1 and G to be connected because that reduces the dimension of their graph Laplacian's null spaces, improving the stability of the solution.

III. LABEL PROPAGATION AND SIGNAL DECOMPOSITION AS FEASIBILITY PROBLEMS
As discussed in the introduction, many instances of the change detection problem can be redefined as signal feasibility problems.In support of this idea, here, we elaborate on the reformulation of two key problems: label propagation and signal decomposition.As explained in Sec.II, these problems serve as the basis of unsupervised and semi-supervised change detection schemes and can be transformed into feasibility problems.
Let α > 0 be a regularization constant.We now consider a regularized version of the LP problem (3), namely RLP, defined as follows: Unlike label propagation, RLP penalizes constant components from the null space of L G , which may improve labeling function estimates at extremely low label rates.Our interest however is to stay as close as possible to the solution of LP and leverage the strong convexity of the regularized objective.We thus set α arbitrarily closer to zero.By noticing that the objective function in (6) can be expressed as 2 by factoring L G in terms of U and Λ as in (2).RLP can be reformulated in terms of w as follows: where We now show that the solution to RLP problem can be obtained using a feasibility problem.Proposition 1.Let ϵ > 0, and define C as a ℓ 2 ball centered at 0 with radius ϵ, i.e., B ℓ2 0,ϵ = {w ∈ R n : ∥w∥ 2 ≤ ϵ}.Then the solution to the RLP can be obtained by: when ϵ is set as the optimal value of (6).
Proof.Let w be the solution to (7).It is not difficult to show that w is indeed unique.Then, it holds that ∥ w∥ 2 > ∥w∥ 2 for all w such that e T i U(Λ + αI n ) −1/2 w = f (i), i ∈ S. Now, let w * be the solution to (8).Then ∥w * ∥ 2 ≤ ∥w∥ 2 = ϵ and e T i U(Λ However by the uniqueness of w, there is no feasible point with norm less than ϵ, and therefore it must be that w * = w, and this completes the proof. Using a similar argument, a regularized version of problem (5) can be formulated as the following feasibility problem: where U and Λ satisfy (2) with respect to the graph Laplacian L G1 of the graph G 1 .Note that in (9), the product set C 1 × R n does not impose restrictions on δ, but we could certainly constrain R n to a ℓ 1 -norm ball to promote sparsity on δ.

A. Signal feasibility formulations of change detection
Based on the feasibility problems ( 8) and ( 9), we now state our proposed semisupervised and unsupervised CD schemes.The key idea is to increase the level of abstraction and regard the linear operator U(Λ+αI n ) −1/2 in ( 8) and ( 9) as particular instances of a more general graph-based linear operator H.
In our first scheme, we assume the signal of interest f to be a labeling function from the vertex set V = {1, . . ., n} into a binary set {0, 1}, where f indicates the presence of a change or not over a certain period of time.Let m and k be integers such that 1 < m, k ≤ n, and consider a subset S = {s 1 , . . ., s m } of the vertex set V with cardinality |S| = m.Provided a few observations y 1 , . . ., y m of f at S such that the problem of change detection can be reduced to finding a set of coefficients or representation w = (w 1 , . . ., w k ) T that lead to the observations through a prediction model where e 1 , . . ., e n denotes the standard basis, ϕ : R → R is a nonlinear activation function, and H : R k → R n is a linear operator that spans a class of smooth graph signals on G.More formally, the problem is given by the following feasibility problem: where C imposes desirable convex ball constraints on w.
In our second scheme, we can drop the requirement of a few labels by using instead a difference signal as a proxy for the labels.Specifically, in the same spirit as the works in [2], [11], [12], [13], a post-event signal x 2 ∈ R n can be modeled as the sum of a pre-event signal x 1 ∈ R n and a difference signal δ, i.e., x 2 = x 1 + δ.Provided observations of x 2 and appropriate graph smoothness constraints on δ and x 1 , the problem of CD can be formulated as follows: where w denotes a representation of x 1 with respect to H whose columns span a class of smooth signals that live on a graph G 1 inferred from observations of x 1 (i.e., the pre-event image), and L : R n → R n defines a linear operator (e.g., the graph Laplacian) that measures the variation of δ on a graph G inferred from observations of both x 1 and x 2 , i.e., the preevent and post-event images.Similar to (9), we define C as a product space, i.e., the result of a Cartesian product between two component sets C 1 and C 2 , which impose desirable regularity constraints on w and δ such as small ℓ 2 norm and sparsity respectively.

IV. PROPOSED METHOD
To solve the proposed signal feasibility problems, we now introduce the concept of variational inequalities using the label propagation problem as an example.Variational inequalities provide a way to accurately describe the solution set of convex minimization problems, including the problems of interest.By understanding this principle, we then show how to approach the problems ( 10) and ( 11) and find accurate solutions using a provably convergent block-iterative algorithm.

A. Expressing label propagation's solution with a variational inequality
Let C = {x : x i = f (i), i ∈ S} denote the equality constraints in (3).Then the LP problem can be reformulated as the following unconstrained minimization problem: where i C denotes the indicator function of the set C. That is, i C (x) = 0 for x ∈ C and +∞ otherwise.By virtue of [16, Theorem 3 (Fermat's rule)], x is a minimizer of the objective function in (12) if and only if 0 belongs in the subdifferential of the objective function at x.More formally, where N C (x) denotes the normal cone of C at x, defined by otherwise [16].Consequently, if the condition (13) holds, there exists a pair Therefore, we can solve ( 12) via the following problem:

B. Proposed method
The proposed feasibility problems (10) and ( 11) can be regarded as particular instances of the problem of signal recovery from non-linear observations [16], [30], particularly the following more general feasibility problem: where the b i 's denote observations of z through the operators ϕ i • H i .To clarify, note that (15) reduces to (10) when z := w and ϕ i := ϕ, H i := e T si H, b i := y i , i = 1, . . ., m.To show that (15) reduces to (11), we rely on the following proposition.
Then, using Proposition 2, we have that and We thus conclude (15) can be reduced to (11).
When the set of nonlinear equations in (15), i.e., ϕ i (H i z) = b i , i = 1, . . ., m are consistent, the solution to (15) can be found using the algorithms developed in [30].However, since our modeling assumptions and our observations may be inaccurate, and as a result, there may not be points that satisfy all equations exactly, we assume that our prior modeling constraints may be inconsistent.In this setting, as a result of [17], the solution to (15) can be reliably approximated by the following variational inequality: ) As demonstrated in [17, Proposition 4.1 and 4.5], not only is ( 16) a suitable relaxation1 of ( 15), but it also arises from an underlying convex optimization program.The process of obtaining such inequality was illustrated above for the LP problem, which is convex, leading to the variational inequality (14).Under reasonable conditions on C, ϕ i , and H i , as demonstrated in [17, Proposition 4.6], Algorithm 1 is a provably convergent block iterative fixed-point algorithm that converges weakly to a solution of problem (15).In the next subsections, these conditions are defined formally, and we show that these, in fact, hold in our setting.
Algorithm 1: Block-iterative fixed point algorithm to solve (16) Data: Initialize t 1,0 , . . ., t m,0 , z 0 as zero vectors; Set Implementation details. in Algorithm 1, we assume α i = 1/m, i = 1, . . ., m, and set I n = {1, . . ., m} for all n, which is referred to as full activation strategy in [17].More generally, we note that (17) defines the activation protocol of Algorithm 1, and it provides an interesting mechanism to modulate and accelerate the solution to the problem.Another interesting activation protocol is a cyclic activation strategy over a partition of [m].Here let P > 0 be an integer.Given a partition S 1 , S 2 , . . ., S P of [m], i.e., [m] = ∪ P p=1 S i such that for i ̸ = j ∈ [P ], S i ∩ S j = ∅.Then the cyclic activation is given by I n = S n mod P .This can be particularly useful to reduce the iteration complexity in a semi-supervised regime when there is a sufficiently large set of annotated super-pixels.Furthermore, if we would like to prioritize a subset S 1 , we could define a collection of sets with repeated subsets {S 1 , S 1 , S 2 , . . ., S P }, and set I n to cycle over such a collection, and thus give more emphasis to S 1 .The interested reader is referred to the seminal work [17], where the algorithm is treated in greater generality and more examples of activation protocols are provided.

C. Choice of ϕ
We now define formally a crucial property that the operators ϕ i must satisfy and show that this property holds for the relevant signal feasibility problems.
First signal feasibility program.here our interest is to set ϕ such that it maps real values into probabilities.Therefore, ϕ takes the form of standard activation functions such as sigmoid or softmax.More precisely, for propagating binary labels, we define ϕ : R → [0, 1] to be the (unimodal) sigmoid function, that is ϕ is given by and for the case of multiclass classification with q labels, we define ϕ : R q → [0, 1] q to be the softmax activation function, which is given by An important property of these activation functions and many others is that they can be shown to be proximity operators of functions in the set proper, lower semi-continuous, convex functions Γ 0 (H) [31].As a result, they are firmly nonexpansive operators or alternatively satisfy Definition 2 as stated in the following theorem.To show that ϕ is firmly non-expansive, it suffices to state that as a result of [31], there exists a function ϕ 0 ∈ Γ 0 (H) such that its proximal operator prox ϕ0 = ϕ.By virtue of the above theorem, ϕ is thus firmly non-expansive.
Second signal feasibility problem.here our interest is to exploit closed convex set constraints D, and the operators ϕ are given by as in Proposition 2. Since the indicator function of the set D, i D : x → {0, +∞} lies in Γ 0 (H) and the proximal operator of i D is given by [16]: by virtue of Theorem 1, operators of the form ϕ : x → x−proj D (x) are firmly nonexapansive.Therefore, this type of operator can also be used to solve the change detection model based on signal decomposition using Algorithm 1.

D. Choice of C
The complexity of our estimates is controlled by the size and structure of C. For the sake of tractability, C should be a closed convex set.For the case of (10), in the unconstrained case, we may set C as either the k-dimensional real space C = R k , or as a convex ball2 : where w 0 denotes the center of the ball and ϵ denotes its radius.w 0 can be used to specify an initial estimate of w, but in our experiments, it will be set as the zero vector.When p = 2, the effect on w will be that of shrinkage as in ridge regression.When p = ∞, i.e., B ℓ∞ w0,ϵ , the constraint set C imposes shrinkage and a cap on the values of w.And similarly, when p = 1, i.e., B ℓ1 w0,ϵ , the set C imposes sparsity on the values of w.
For the case of ( 11), we define C as the Cartesian product of closed convex sets C 1 and C 2 given by: Like in the previous case, here we can define both C 1 and C 2 as convex balls.A possible configuration is to impose shrinkage on w and sparsity on δ so the sets can be set as 0,ϵ1 and C 2 = B ℓ1 0,ϵ2 .The iteration complexity of Algorithm 1 thus relies on our ability to compute the projection of z onto C effectively.Luckily, this is the case for the convex ball constraints of interest.In general, for ϵ > 0, the projection of v onto B ℓp w0,ϵ can be written in terms of the unit-ball centered at 0 as: For p = 2, proj B ℓ 2 0,1 (v) has a closed-form given by [25]: For p = ∞, we can rely on the Moreau decomposition stated in the following theorem.
The idea here is that we can leverage the proximal operator of the conjugate of f to compute its proximal operator and vice versa.In the case where f = ∥ • ∥, its conjugate f * = i {x:∥x∥ * ≤1} ; the indicator of the unit ball induced by the dual norm of ∥ • ∥.Since ℓ 1 and ℓ ∞ norms are dual norms, and for a nonempty closed convex set C, prox i C = proj C , the projection onto the ℓ ∞ unit ball centered at 0 is given by: where the proximal operator of the ℓ 1 norm is the well-known soft-thresholding operator.
Projecting onto the ℓ 1 unit ball using Theorem 2 is however not as efficient because the proximal operator of the ℓ ∞ is less straightforward to compute.Here it is thus preferable to compute the projection by relying on the definition.The projection onto the ℓ 1 unit ball centered at 0 is given by: where λ > 0 is such that The case where ∥v∥ 1 > 1 thus requires solving a piece-wise linear equation to find λ (See [32]) and then applying soft-thresholding with parameter λ.

E. Choice of H
As elaborated in Sec.III, the LP's solution can be accurately described by a signal that lies in the column space of H = U(Λ + αI n ) −1/2 , from here on referred to as the RLP graphbased linear operator.In the context of our model, H can be suitably set as H.
Constructions of H that rely explicitly on the spectral basis of the graph Laplacian may nonetheless not scale well in settings where the number of superpixels is significantly large.A common approach to avoid direct computation of the eigenvectors is thus to leverage graph filtering techniques.Compressive spectral clustering is a successful example of this approach, where the eigenvectors associated with the smallest eigenvalues can be approximated by efficient lowpass graph filtering of random signals defined on the nodes of the graph [33], [34].To illustrate this concept, we exploit a particular class of graph filters, namely convolutional graph filters [35].Specifically, we leverage the back-end of an untrained graph convolutional neural (GCN) network, which is defined by a composition of graph filters (or diffusion operators) and non-linear activation functions [34], [36].More formally, let X ∈ R n×d on G be a feature matrix.The back-end of a two-layer graph convolutional network with parameters Θ 1 ∈ R d×p and Θ 2 ∈ R p×k is given by [36]: where with WG = I n + W G and DG = diag( DG 1).S G denotes a graph shift operator [35], [14], which performs a smoothing operation to X on G. Similarly, by removing the ReLU(•) activation function in (27), we obtain the back-end of a two-layer simplified graph convolutional network (SGCN) as: where Even though we could learn the filter parameters Θ 1 , Θ 2 from training data or self-supervised learning, randomly initialized GCN networks are cost-effective and produce surprisingly good representations in many cases, e.g., [37].Precisely, we assume the filter parameters Θ 1 and Θ have entries drawn from zero-mean Gaussian distribution and Θ 2 as an identity matrix.Even though this broadens the spectrum of the signals X on G before filtering, on average the smoothness level of the randomly mixed signals can be shown the same as the individual signals in X. Equally, at the feature level, since random projection matrices are distance-preserving, the relationships among feature vectors may not be drastically affected by the random transformation.The last filter parameter Θ 2 is set as an identity matrix to preserve the smoothness properties of the filtered signals.When we define H in terms of the vectors resulting from these operations, and we will refer to it as 2-GCN or 2-SGCN in our experiments.

F. Choice of H
As elaborated in Sec.III, the LP's solution can be accurately described by a signal that lies in the column space of H = U(Λ + αI n ) −1/2 , from here on referred to as the RLP graphbased linear operator.In the context of our model, H can be suitably set as H.
Constructions of H that rely explicitly on the spectral basis of the graph Laplacian may nonetheless not scale well in settings where the number of superpixels is significantly large.A common approach to avoid direct computation of the eigenvectors is thus to leverage graph filtering techniques.Compressive spectral clustering is a successful example of this approach, where the eigenvectors associated with the smallest eigenvalues can be approximated by efficient lowpass graph filtering of random signals defined on the nodes of the graph [33], [34].To illustrate this concept, we exploit a particular class of graph filters, namely convolutional graph filters [35].Specifically, we leverage the back-end of an untrained graph convolutional neural (GCN) network, which is defined by a composition of graph filters (or diffusion operators) and non-linear activation functions [34], [36].

V. CHANGE DETECTION ON GRAPHS
Let c 1 , c 2 , n 1 , n 2 > 0 be positive integers, and consider a collection of c 1 + c 2 suitably prepossessed images of an area of interest X 1 , . . ., X c1 , X c1+1 , . . ., X c1+c2 . ( that reside on a two-dimensional domain Ω = {1, . . ., n 1 } × {1, . . ., n 2 }, where {X l } c1 l=1 and {X c1+l } c2 l=1 denote pre and post event multichannel images, the goal of change detection is to construct a change map F : Ω → {0, 1} that takes coordinates in Ω into a binary label {0, 1}, which determines whether the land cover has changed or not. Since our goal is to exploit pairwise relationships among image superpixels to reduce time and space complexity for graph inference and label estimation.Particularly, we define a superpixel function sp : Ω → V , which maps coordinates in the high-resolution domain Ω into a reduced set of linear coordinates V = {1, . . ., n}.The function sp thus induces a partition on Ω = n i=1 Ω i such that for i ̸ = j ∈ V , Ω i ∩Ω j = ∅ and Ω i = {(j 1 , j 2 ) ∈ Ω : sp(j 1 , j 2 ) = i}.
Based on {Ω i } i∈V , and a collection of images we will define two generic operations to extract signals on V from the pre-event and post-event images {X l } c1 l=1 ∪ {X c1+l } c2 l=1 .Definition 3. The average signals X = (x ij ) ∈ R n×c on V associated with a collection of images {X j } c j=1 on Ω has entries x ij such that for j = 1, . . ., c, where mean(X j (Ω i )) computes the average value of the image X j at superpixel Ω i .Definition 4. Let r be an integer s.t.0 < r << min(n 1 , n 2 ).The feature signals X on V with parameter r associated with a single image X consists of rows x 1 , . . ., x n ∈ R (2r+1) 2 such that where c denotes the centroid of Ω i , patch(X) c,r extracts a square patch of size 2r + 1 × 2r + 1 centered at c from X, and vec(•) denotes the vectorization operator that transforms a matrix into a vector.

A. Learning graphs from data
Learning graph structure from satellite imagery over a set of superpixels is arguably the most fundamental step in graphbased change detection.This process consists of inferring a graph on which a collection of training signals satisfy desirable properties (e.g., smoothness) and the graph has advantageous structural characteristics such as connectedness and sparsity.
In this work, given a collection of average signals X1 ∈ R n×c1 and X2 ∈ R n×c2 on a vertex set V = {1, . . ., n}, extracted from the pre-event and post-event images respectively as in Definition (3), we construct a graph such that the subject signals are smooth on the graph.This assumption has been instrumental in recent literature [11].To do so, we thus search for the graph G by solving an instance of the following optimization program: where the parameters σ 1 , σ 2 > 0 are user-defined and modulate the contribution of each modality to the computation of pairwise distances between nodes.The objective function measures the smoothness of the collection of signals ( X1 /σ 1 , X2 /σ 2 ) on the graph.By rewriting the objective function in terms of the weight function as follows: where dist k (i, j) = ∥(e i − e j ) T Xk ∥ 2 , k = 1, 2, we can search for G by optimizing over the set of weight functions w : V × V → R that satisfy w(i, j) > 0, w(i, j) = w(j, i) and w(i, j) = 0, (i, j) / ∈ E 0 , where E 0 ⊂ V × V are a set of userdefined edges.In our context, the edge set E 0 is defined as the edge set of a K-NN graph, where the K-nearest neighbors i 1 , . . ., i K ∈ V of node i are defined by the distance between nodes i and j, i.e., dist(i, j) = ( where j ∈ V \{i, i 1 , . . ., i K }.In this setting, all constraints on w can be easily imposed except the non-negativity constraint.A simple way to relax such a constraint is to use the entropic penalty [38].To simplify notation, we say that Furthermore, let us denote w(i, j) by w ij and dist(i, j) by d ij .
The optimization problem thus becomes: with regularization parameter σ > 0. The solution to this problem has a closed-form solution given by [39]: where κ > 0 is a normalization constant.A slightly more complex model proposed by [39] that allows one to control the sparsity of the edge set while maintaining the connectedness of the graph is given by: where θ > 0 controls the sparsity of the graph's edge set.
Even though the solution to this problem can not be found in closed form, the solution can be found efficiently using the algorithm proposed by [39], where a heuristic to set θ is proposed.Although this method is a general-purpose structure learning algorithm, in the context of change detection has been found to produce notable results as shown in [11].
Parameter settings: in our experiments, we will set the number of nearest neighbors K = ceil( √ n), but this is only based on the intuition that the number of neighbors should be proportional to the number of nodes but it should grow slowly as n increases so that the cardinality of the edge set does not grow prohibitively large.Moreover, the normalization constants σ 1 , σ 2 will be set as Similarly, when the Gaussian kernel graph is used, we will set σ such that (3σ

B. Change map estimation
In a semisupervised regime, we learn a graph G from post and pre event average signals constructed as in Definition 3 using (32), where each vertex represents a superpixel.Given a few annotated vertices, we can estimate the label of the remaining nodes via label propagation or the proposed algorithm.This results on a labeling function estimate f on V .
We then obtain a high resolution change map F on the image domain Ω with entries F ij defined by f as follows: When the output of the semisupervised method is a probability map as in graph convolutional neural networks with a Sigmoid output layer and the case of our proposed semisupervised method.We decide that a node with a probability greater than 0.5 is changed.Otherwise, it remains unchanged.Even though this is a natural threshold, it may not be optimal because a small numerical instability or perturbation could eventually take a node from changed to unchanged or vice versa.More elaborate thresholding methods can thus lead to improved change detection performance.
In an unsupervised regime, in addition to G, we learn a graph G 1 only from pre-event average signals using (31).Therefore, we set dist 2 (i, j) 2 = 0, and dist(i, j) 2 = dist 1 (i, j) 2 /σ 2 2 .Then, for each column x 2,j of X2 , we estimate pre-event and difference signals x1,j , δj ∈ R n using (5) or the proposed signal feasibility problem (11).We then form the difference signal's magnitude Based on the magnitude, we can estimate the change map by labeling nodes with larger magnitudes as change and those with smaller magnitudes as non-change.To do so, we first obtain a high resolution estimate ∆ = (∆ i,j ) on Ω with entries ∆ ij defined by | δ| as follows: Then we obtain a pair of decision thresholds using Otsu's multilevel threshold method [40], and decide that intensity values above the larger threshold denote changes on the scene while the rest do not.As will be shown in the experiments, a single threshold can also be estimated by assuming that the magnitude values come from a bimodal distribution but the performance metrics may suffer.

VI. EXPERIMENTAL RESULTS
This section shows the capabilities of the proposed change detection method both in semi-supervised and unsupervised regimes.We consider six well-known change detection datasets, namely Alaska, Atlantico, Mulargia [11], California3 [6], Toulouse4 [4], and Shuguang5 [41].Alaska and Mulargia Lake denote homogeneous multispectral (MS) datasets of ice melting and flood events captured by Landsat-5 TM.Atlantico denotes a homogenous synthetic aperture radar (SAR) dataset of a dam-induced flood captured by ALOS/PALSAR.California and Shuguang denote heterogeneous datasets of flood and construction events captured by Landsat-8 and Sentinel 1A and RadarSat-2 and Google Earth respectively.California consists of MS and SAR images and Shuguang of RGB optical

A. Pre-processing and superpixel function
Satellite imagery from heterogeneous sources is often affected by noise, outlying variations, and scale differences that may affect the performance of any change detection method.Therefore, before applying the change detection methodology in Sec.V, we perform a series of prepossessing steps on the pair of raw multichannel satellite images associated with the pre-and-post event.The aim is however not to distort or compromise any relevant statistical properties of the data, in our case, pairwise relationships among feature vectors, which are vital to capture the geometry of the underlying feature space using graphs.
First, we run a 5-by-5 median filter through each image channel so that we remove possible outlying variations localized on relatively small dispersed supports.Second, we clip (or saturate) each multichannel image to be within the standard outlier detection interval defined by their median and interquartile range, computed over a flattened version of the respective multichannel image. 6This step has to be done carefully so that undesirable nonlinear effects (e.g., large constant intensity blobs) do not severely affect pairwise relationships in the data, leading to such blobs being perceived as changes on the scene.This is the case for Shuguang, and therefore it did not undergo this preprocessing step.Lastly, we normalize each multichannel image by its range, where the range is defined by the difference between the maximum and minimum value of the vectorized version of the multichannel image.Then, we form the collection of images in (29) using the prepossessed multichannel images, where the superpixel function sp : Ω → V is defined using the SLIC superpixels algorithm [42] on a pseudo-RGB image formed by the average preprocessed post and pre event multichannel images and their absolute difference image scaled by its range [11].For all considered datasets, we set the number of superpixels as 2400 with compactness and sigma parameters equal to 10 and 1 respectively.Table I provides a brief description of the considered datasets and the obtained number of superpixels as well as the number of positive ones as defined by majority vote with the ground truth labels.

B. Change probability and change map estimation from a few labeled nodes
We now leverage the model (10) to produce probability maps that comply with a few annotated superpixels and construct change detection maps by thresholding.We begin by learning a graph G on a set of superpixels from pre and post-event average signals as in Definition 3 using the graph structure learning (GSL) model in (32).Next, we generate annotated superpixels by considering a partition of the vertex set V onto positive and negative nodes, i.e., V = V + ∪ V − such that V + ∩ V − = ∅.We then select subsets of positive and negative samples S + ⊂ V + and S − ⊂ V − uniformly at random such that the subsets have the same cardinality |S + | = |S − | = m/2.Lastly, we join the subsets into S = S + ∪ S − and create an ordering of its elements such that S = {s i } m i=1 , and set the labels y = (y i , . . ., y m ) T such that: ∀i ∈ [m], y i = 1, s i ∈ S + , y i = 0, s i ∈ S \ S + .Figure 2 shows an example of superpixel boundaries and a few resulting annotated superpixels for the Alaska dataset.Given G, S, and y, we run our approach under two settings for all datasets.The first setting defines H as H = U(Λ + 10 −3 I n ) −1/2 as elaborated in Sec.IV-F, which is referred to as RLP.The second setting defines H as the back-end of a twolayer GCN (27) with feature signals X ∈ R n×(2r+1) 2 (c1+c2) such that X = (X 1 /σ 2 , X 2 /σ 2 ), where X 1 ∈ R (2r+1) 2 c1 and X 2 ∈ R (2r+1) 2 c2 denote pre-event and post-event feature matrices constructed as in Definition (4) with patch radius equal to r = 2.For both settings, we evaluate two cases of C as a zero-centered convex balls B ℓ2 0,ϵ and B ℓ∞ 0,ϵ with radius ϵ.To set ϵ, we first run Algorithm 1 in the unconstrained case, i.e., C = R n for N = n, and obtain zN and the auxiliary vectors t0,N , . . ., tm,N .The value of ϵ is then set as a fraction of the value of ∥z N ∥ 2 , specifically, ϵ = 0.75∥z N ∥ 2 .After obtaining ϵ, we run Algorithm (1) for ceil(N/2) more iterations with C as either B ℓ2 0,ϵ or B ℓ∞ 0,ϵ respectively but this time we initialize t 1,0 , . . ., t m,0 and z 0 as the values obtained in the unconstrained case t0,N , . . ., tm,N and zN .This can be thought of as warm up step to better initialize the Algorithm and determine ϵ.
Table II reports the average and standard deviation of different commonly used metrics over 10 trials, where the variability comes from the random selection of S. For each class, the number of labels |S| = m is arbitrarily chosen as 6.5% of the total number of positive superpixels.That is, m/2 = ceil(0.065|V+ |), and change maps are constructed by a threshold of 0.5 on the estimated probability maps.In the table, results with label propagation and a two-layer GCN network and two-layer simplified GCN network trained from y with the same feature matrix X are also included.To train the networks, we use Adamax optimizer with learning rate 10 −2 and weight decay 10 −4 .Figure 3 displays some estimated probability maps.The probability maps highlight more details on the scene and help establish some level of confidence in determining whether a change has occurred.Also in Fig. 3, we include confusion change maps for visual evaluation of the CD performance at the threshold of 0.5.Based on the figure and the quantitative results in the associated table, we can observe that the proposed methods perform as well as the comparing approaches and sometimes outperform them.Moreover, the linear operator defined by the randomly initialized 2-GCN's back-end perform notably well compared with the linear operator defined using the spectral basis of the graph Laplacian.Thus, our approach has potential to scale well in large scale settings.In the case of Toulouse, the untrained 2-GCN back-end does not perform so well and neither does the trained 2-GCN network.The trained 2-Layer SGCN network performs as well as the proposed method with RLP.By tuning the random weights perhaps by means of selfsupervised learning, we may be able to configure a betterperforming change detection method that relies neither on neither the labels nor on the spectral decomposition of the graph Laplacian.

C. Unsupervised change map estimation using signal decomposition
We now evaluate the capability of our approach to constructing change maps for the above-described datasets without annotated superpixels and compare our results against stateof-the-art graph-based and deep learning-based methods IST-CRF [41], GIR-MRF [43], CA-Ae [9], and X-Net [8].In this context, we consider the proposed signal feasibility problem in (11).We thus require a graph G on which both preevent and post-event signals are simultaneously smooth, and a graph G 1 on which pre-event signals are sufficiently smooth.Here, we can use the same graph G as in the semisupervised regime, constructed as in (32).In addition, we construct the graph G 1 from pre-event average signals X1 , using a K-NN graph with Gaussian weights as in (31).Unlike our previous semisupervised experiment, we define here H using the graph G 1 as elaborated in Sec.IV-F.To estimate ϵ 1 and ϵ, as in the previous experiment, we run Algorithm 1 for N = n iterations with C = R n × R n and define the smoothness parameter ϵ of B ℓ2 0,ϵ initially as ∥L G δ∥ 2 , where δ is the average empirical difference signal obtained from observed pre-event and postevent signals.This produces an estimate w N and δ N of w and δ, and a set of associated auxiliary variables.The final estimates of w and δ are then obtained by running Algorithm 1 for ceil(N/2) more iterations using C = B ℓ2 0,ϵ1 × R n with parameter ϵ 1 = 0.75∥w N ∥ 2 and B ℓ2 0,ϵ with ϵ = 0.15∥δ N ∥, where the fractions 0.75 and 0.15 are selected arbitrarily to produce reasonable results.However, more judicious crossvalidation studies are advisable for improved performance.
Figure 4 displays the estimated average pre-event image, the original average post-event image, and the estimated difference signal's magnitude image.By simple inspection of both the resulting pre-event image and the post-event image, we can already detect some important changes in the scene, and similarly, the magnitude image of the difference signal seems to highlight those changes as well.Unlike the semisupervised approach, we cannot simply set a threshold at 0.5 for change detection.Rather, we rely on statistical assumptions on the distribution of the magnitude values to determine a suitable threshold.In doing so, we use Otsu's multi-threshold method, assuming that the values come from a multi-modal distribution with three modes.The fourth and fifth columns of the same figure show a three-class segmentation of the magnitude image obtained by Otsu's method and the confusion change map obtained by using the largest threshold.By looking at the true positives and true negatives in the confusion change map, we can see that our method is able to detect major changes accurately.As will be illustrated in the ablation studies, a finer analysis with a large enough number of superpixels can lead to improved change detection.
Table III presents the performance metrics for all datasets and different configurations of the proposed method.These configurations include two choices of the set C 1 : ℓ 2 and ℓ ∞ norms, and change detection based on a single threshold (ST) and two thresholds (MT) obtained using Otsu's method.Otsu's method assumes that the values of the magnitude image follow either a bimodal or three-modal distribution.The table reports results for Atlantico and Shunguang datasets using the 2-GCN back-end as H since it demonstrated better performance compared to the RLP matrix.For Atlantico, instead of solving for w and δ with the constraint Hw + δ = x 2 given x 2 , we utilize Hw − δ = x 1 given x 1 .In this case, H is associated with a graph G 2 linked to observed post-event average signals.
Our preliminary experiments reveal a performance asymmetry when the assumption order is reversed for certain datasets, which will be discussed shortly.
We note that the comparative methods IST-CRF [41] and GIR-MRF [43] were executed using the open-source code provided by the authors, employing their proposed superpixels and suggested parameter values: N iter = 7, α = 9, β = 9 for IST-CRF, and PenaltyType='Fro', β = 1, λ = 0.1, and η = 0.05 for GIR-MRF.Similarly, the deep learning-based methods CA-Ae [9] and X-Net [8] were executed using the authors' publicly available code.Please note that there might be variations in the results presented here compared to those in the original papers due to possible differences in datasets and the number of random trials.As observed, our performance metrics are comparable and, in some cases, outperform the state-of-the-art methods, even though our approach is less specialized for the CD task and does not rely on meticulous hyperparameter selection.

D. Ablation study on the number of superpixels and structure learning algorithm
In this subsection, we characterize the performance of our method as we vary the number of superpixels and the graph structure learning (GSL) method.To that end, we compare the previously used GSL method defined by (32) (henceforth, Kalofolias) with the graph obtained by solving (30) over G defined by the family of unweighted, connected, and acyclic graphs, namely trees.A tree G = (V, E) has the sparsest edge structure E with cardinality |E| = |V | − 1 that enables any two nodes u ̸ = v ∈ V to be connected on G.The underlying GSL problem is equivalent to finding a tree of minimal length on a distance matrix, or alternatively the minimum spanning tree (MST), henceforth MST with binary weights.To remove performance variation due to the thresholding method (e.g., Otsu's method), we evaluate the detection capabilities of the produced probability maps and difference images using the area under the receiver operating characteristic curve (ROC), namely AUC.In the semisupervised regime, we use the same label rate defined in our previous experiment.In the unsupervised regime, the GSL method used to construct the Gaussian kernel graph G 1 is kept fixed.
Figure 5 and 6 summarize the obtained results for Alaska and California datasets.We can observe that despite changing the GSL method, the performance metrics follow similar trends with the number of superpixels (See Fig. 5).Also, their average behavior across the different superpixels sets is almost the same (See Fig. 6).Not only does this indicate a desirable consistent behavior across datasets, but it also shows that on average, the AUC values are higher than 0.75 regardless of the cardinality of the superpixel set and the considered dataset, indicating the effectiveness of our method.Similarly, despite small average differences in AUC values, Kalofolia's graph outperforms the MST graph with binary weights in the semisupervised regime.
Also in Fig. 6, we report other common average performance metrics for a threshold resulting in a false positive rate (FPR) of 0.05.Recall that as we decrease the threshold, the FPR value increases and the true positive rate (TPR) tends to one.For the selected threshold, our method achieves a TPR value significantly greater than 0.05 on average, i.e., regardless of the number of superpixels.This says that our change detection method operates above the identity line in the ROC space, which is of course a desirable property [44].With a sufficiently large set of superpixels, our method can achieve even higher TPR values as indicated by the extent of error bars in the figure.In the same setting, commonly used metrics such as Cohen's kappa coefficient and F1-score align with those obtained by state-of-the-art methods.The proposed method thus exhibits robust and stable behavior against variations in the GSL method and the number of superpixels as demonstrated by Fig. 6.

VII. DISCUSSION AND CONCLUDING REMARKS
This paper proposes a new approach to the problem of change detection, which leverages recent advances in signal construction from possibly inconsistent prior modeling constraints and regularity assumptions [17], [16].Particularly, we show that a least two change detection schemes that operate in semi-supervised and unsupervised regimes can be cast as signal feasibility problems that belong to a more general class of signal feasibility problems.Since their solution can be expressed as a variational inequality, associated with a convex optimization program, we can leverage a provably convergent block-iterative fixed-point algorithm to construct solutions, which are efficient, stable, and parallelizable, leading to scalability in large-scale settings.
By taking this approach, we show actionable principles that can be used to design change-detection methods that rely on much more general notions of graph signal smoothness.Not only do these improve our understanding of change detection based on graph subspace prior models, but they also open the door to integrating elements of more recent learning paradigms such as self-supervised learning [45].Note that we show that the back-end of a GCN network with random weights spans a space of smooth signals, which is suitable for change detection.By adjusting the random weights using a form of self-supervised learning, the representation properties of the induced space of smooth graph signals may radically improve.Unlike our configuration with the spectral decomposition of the graph Laplacian, i.e., RLP, the space of signals induced by the back-end of GCNs can exploit additional features and scale much better on large-scale graphs.GCNs exploit the much more fundamental concept of graph filtering.Therefore,

Fig. 1 .
Fig. 1.Proposed graph-based change detection methodology based on signal feasibility problems and variational inequalities.
known as the variational inequality problem of L G with respect to C. The following definition states the general concept.Definition 1. [29, Chs.25-26] Let C be a closed convex set of a Hilbert space H and let B : H → H be a monotone operator.The variational inequality of B with respect to C is given by find x ∈ C s.t.∀y ∈ C, ⟨y − x, Bx⟩ ≥ 0.

Proposition 2 .
[16, Problem 62] Consider a pair of Hilbert spaces H and G. Let D ∈ G be a closed convex set and let 0 ̸ = L : H → G be a linear operator.Then Lx ∈ D ⇐⇒ Lx − proj D Lx = 0. Now, let z = (w, δ).Moreover, Let H 1 = (H, I n ) be such that H 1 z = Hw + δ and

Definition 2 .
[29, Proposition 4.2]  Let D be a nonempty subset of a Hilbert space H and let ϕ : D → H. Then ϕ is firmly non-expansive if and only if

Theorem 1 .
[16, Example 11]  Let f lie in the set of proper, lower semi-continuous, convex functions Γ 0 (H).Then prox f and Id − prox f are firmly non-expansive operators.

Fig. 2 .
Fig. 2. Summary of Alaska dataset.From left to right, average pre-event and post-event average images, ground truth change map, and a few superpixels annotated uniformly at random for a given class.

Fig. 3 .
Fig. 3. From left to right, ground truth, change map estimated by LP, change probability maps estimated by the graph neural networks and change probability maps estimated by the proposed approaches.Every other row displays confusion change maps obtained from the probability maps with a threshold of 0.5.The number of annotated samples per class was about 6.25 percent of the minority class' cardinality and correspond to the trial with the highest kappa value.

Fig. 4 .
Fig.4.Results of proposed unsupervised change detection method with C 1 defined by B ℓ∞ 0,r .For Alaska, California, Mulargia, and Toulouse, H is defined by RLP.For Atlantico and Shuguang, H is defined by 2-GCN.From left to right, we show the estimated average pre-event image, the original average post-event image, the estimated difference signal's magnitude image, the magnitude image's segmentation into three classes, the confusion change map, and the ground truth change map.For segmentation and change detection, multi-threshold Otsu's method with the assumption of three modes was used.Here blue, bluish-green, and yellow colors indicate segments with smaller, medium, and larger magnitudes of the difference signal, respectively.

TABLE I DESCRIPTION
OF DATASETS.COLUMNS n, n + STORE THE TOTAL NUMBER OF SUPERPIXELS AND THE NUMBER OF POSITIVE SUPERPIXELS RESPECTIVELY.A SUPERPIXEL IS SAID TO BE POSITIVE IF THE MAJORITY OF ITS PIXELS HAVE A POSITIVE LABEL ASSIGNED BY THE GROUND-TRUTH CHANGE MAP AND IS SAID TO BE NEGATIVE OTHERWISE.Toulouse denotes a heterogeneous RGB dataset of a urban construction captured by Pleiades and WorldView2.TableIprovides some specifics of each dataset.