Learning Hypergraphs Tensor Representations From Data via t-HGSP

Representation learning considering high-order relationships in data has recently shown to be advantageous in many applications. The construction of a meaningful hypergraph plays a crucial role in the success of hypergraph-based representation learning methods, which is particularly useful in hypergraph neural networks and hypergraph signal processing. However, a meaningful hypergraph may only be available in specific cases. This paper addresses the challenge of learning the underlying hypergraph topology from the data itself. As in graph signal processing applications, we consider the case in which the data possesses certain regularity or smoothness on the hypergraph. To this end, our method builds on the novel tensor-based hypergraph signal processing framework (t-HGSP) that has recently emerged as a powerful tool for preserving the intrinsic high-order structure of data on hypergraphs. Given the hypergraph spectrum and frequency coefficient definitions within the t-HGSP framework, we propose a method to learn the hypergraph Laplacian from data by minimizing the total variation on the hypergraph (TVL-HGSP). Additionally, we introduce an alternative approach (PDL-HGSP) that improves the connectivity of the learned hypergraph without compromising sparsity and use primal-dual-based algorithms to reduce the computational complexity. Finally, we combine the proposed learning algorithms with novel tensor-based hypergraph convolutional neural networks to propose hypergraph learning-convolutional neural networks (t-HyperGLNN).


I. INTRODUCTION
Many graph signal processing applications rely on graph structures naturally chosen from the application domain, e.g.geographical or social networks.There are, however, still a large number of instances in which the underlying graph topology is not readily available.In fact, common choices of graphs for models may not necessarily describe well the intrinsic relationships between the entities on the data [1].In such case, when the underlying graph structure is not available, many algorithms have been developed under the umbrella of graph signal processing (GSP) with the goal of revealing the graph topology by leveraging the relationship between the signals and the topology of the graph where they are supported.Graphs, however, are limited in the sense that they only account for pairwise node interactions.Consequently, significant interest has emerged in extending graph signal processing tools to more general representations such as hypergraphs.Compared to simple graphs, hypergraph structures are more powerful and flexible in modeling polyadic relationships in data.For instance, in a co-authorship network where a group of authors jointly contribute to a paper, a hyperedge can fully describe this polyadic relationship as illustrated in Fig. 1(b)-(c), while edges in conventional graphs (Fig. 1(a)) can only model pairwise relationships, limiting GSP to single-way analysis.Apart from co-authorship networks, high-order correlations among data widely exist in a number of applications like neuronal networks, social networks, and transportation networks [2].To capture the intrinsic polyadic interactions of hyperedges, the hypergraph signal for each node is defined as the high-order correlation between the underlying node and the other nodes.Hypergraph neural networks (HyperGNNs), proposed to leverage the higherorder topology captured by hypergraphs, have drawn a lot of attention and have been applied to many tasks, including drug discovery [3], 3D pose estimation [4], action recognition [5], recommendation system [6], collaborative networks [7], etc.As in the case of graphs, having a good hypergraph topology plays a crucial role in the success of hypergraph signal processing and representation learning methods.Prior efforts on hypergraph learning from data have mostly focused on the matrix representation of hypergraphs which fails to capture the high-order structure characteristic of the data [8]- [11].
As an example, one of the most common hypergraph matrixbased representations, the clique expansion, which replaces every hyperedge with a clique subgraph, fails to provide an injective mapping.As shown in Fig. 1, two different hypergraphs (b) H 1 (c) H 2 are mapped to the same simple graph in (a) by the clique expansion, clearly failing to capture lower-dimensional relationships and not providing an injective mapping for a hypergraph.While tensor-based hypergraph representations can differentiate these two hypergraphs, matrixbased hypergraph representations cannot [2].However, the area of learning tensor-based hypergraphs is still in a very nascent state.In fact, the theory of signal and data processing on higher-order networks is largely unexplored compared to that of simple graphs.
Recently, a tensor-based hypergraph signal processing (HGSP) framework was introduced in [12] analogous to GSP with a Fourier transform defined via the orthogonal symmetric canonical polyadic (CP) decomposition of the hypergraph adjacency or Laplacian tensor.While CP-based HGSP was shown effective in different applications, it has some fundamental drawbacks given by the fact that the adjacency tensor does not have an exact CP orthogonal decomposition [13].Later, Pena-Pena et al. [13] exploited a novel set of t-product factorizations [14] to develop a new hypergraph signal processing framework, dubbed as t-HGSP, which is more stable and loss-free compared to CP-based HGSP frameworks.The t-product factorizations are based on a novel tensor-tensor multiplication (t-product), in which the familiar tools of linear algebra are extended to better understand tensors [14], [15].Based on the CP-based HGSP, Zhang et al. [12] tackle the problem of learning a hypergraph from point cloud data [16] where, in order to avoid the uncertainty and high complexity of the CP-decomposition, they focus on directly estimating the hypergraph eigenvectors and eigenvalues pairs from the observed data instead of the hypergraph adjacency tensor.However, this approach has some limitations.On the one hand, the theory developed heavily depends on defining a supporting matrix P = VΛV which maps the eigenvectors V and eigenvalues Λ obtained from the CP-decomposition to a matrix.On the other hand, a feasible solution to the optimization in [12] that satisfies the constraints of having a valid adjacency tensor cannot always be found; hence, a hypergraph topology is not always made available.In their experiments, the estimated hypergraph eigenvectors and eigenvalues pairs are successfully used in applications such as denoising and compression.However, the hypergraph topology, given by the adjacency tensor, is neither retrieved nor used in any of their experiments.
In order to overcome these limitations and fully exploit the higher-order correlations in observed hypergraph signals, we extend the concepts of the new hypergraph signal processing framework based on t-product factorizations, t-HGSP [13], to propose an algorithm that learns a tensor-based hypergraph representation from a set of signals.As input, we consider a set of hypergraph signals associated with each node in a network whose topology is unknown (Fig. 2(a)), and the goal of the proposed hypergraph learning algorithm is to unveil the underlying hypergraph topology represented by the adjacency tensor (Fig. 2(b)).In our experiments, we not only demonstrate the effectiveness of our approach by retrieving real hypergraphs from signals but also show that the learned hypergraphs boost the performance of recently introduced tensor-hypergraph convolutional neural networks (T-HyperGNN) [17], [18] and hypergraph signal processing applications such as clustering.In summary, the main contributions of this paper are fourth fold.
• First, we introduce an algorithm that learns the hypergraph Laplacian tensor, hence the adjacency tensor, by minimizing the cumulative total variation across all observed hypergraph signals (TVL-HGSP).
• Secondly, we propose an alternative approach (PDL-HGSP) that improves the connectivity of the learned hypergraph without compromising sparsity and takes advantage of primal-dual-based algorithms to reduce time and space complexity.• Thirdly, we develop a new hypergraph learning convolutional neural network framework that learns the hypergraph topology and boosts the performance of recently introduced tensor-hypergraph convolutional neural networks (t-HyperGLNN).• Fourthly, we validate the proposed approach through simulations and demonstrate its potential in real-word applications.The rest of this paper is organized as follows.Given that the proposed algorithms aim at generalizing graph learning methods, in Section II, we review the background on graphs signal processing and graph learning from data.In Section III, before introducing the proposed hypergraph learning algorithm, we laid down the necessary definitions from t-HGSP.Next, an alternative scalable hypergraph learning algorithm is presented in Section IV which leads to Section V in which the new hypergraph learning-convolutional neural network framework is described.The numerical experiments are summarized in Section VI.

II. LEARNING GRAPHS FROM SMOOTH SIGNALS
In graph signal processing (GSP), a graph is denoted as Here, we consider the weighted adjacency matrix to encode the graph structure W = {w i,j }, where w i,j > 0 denotes the weight connecting nodes i and j.A graph signal is then defined as the mapping of nodes in V to real values such that x ∈ R N is a vector representation of the signal defined on the graph with x i as the real value on the i th node [19].From G = (V, W), GSP defines a shift operator, S, as a local operation that replaces the signal's value at each node with a linear combination of the signal's values from neighboring nodes according to Sx.The notion of linear filtering the graph signal, x, by the filter, Q, is achieved by multiplication y = Qx, resulting in a new graph signal y.While Q can be any arbitrary matrix, it is considered a linear shift invariant (LSI) operator if it satisfies the condition that QSx = SQx.
While there is significant latitude regarding what constitutes a shift operator, a commonly used operator is the graph Laplacian defined as L = D − W, where D is a diagonal matrix such that D i,i = j w i,j .From L specifically, GSP defines total variation as a measure of how smoothly the signal varies on the graph structure according to TV(x) = x Lx.Now since L is real and symmetric, it is diagonalizable by means of Eigenvalue decomposition as L = UΛU where U is a unitary matrix whose column vectors are eigenvectors and Λ is a diagonal matrix of corresponding eigenvalues [20], [21].As N -length vectors, the columns of U are, themselves, graph signals whose corresponding eigenvalues are measures of their total variations.Furthermore, by forming a basis of R N such that any graph signal, x, can be written as a linear combination of column vectors from U, the Graph Fourier Transform (GFT) of a signal x on G can be defined as x = U x with the eigenvalues interpreted as frequencies [22], [23].
For the purpose of learning the interactions of signals between nodes, one GSP [1] approach defines the problem of learning a graph from a set of observed signals x 1 , . . ., x P ∈ R N , in order to make certain properties or characteristics of the observations explicit, such as smoothness with respect to G or sparsity in a basis related to G.In the case of smoothness as a prior, the graph edges or weights should be designed to diminish in strength with the increasing deviation between nodes [24]- [27].Many approaches have been developed based on the smoothness prior [1], [28].Dong et al. [24], in particular, proposed learning a graph by minimizing the cumulative total variation across all observed signals according to where X ∈ R N ×P holds the set of observed signals x 1 , . . ., x P ∈ R N concatenated column-wise and L denotes the set of valid graph Laplacians.The Frobenius norm of the Laplacian L 2 F penalizes the formation of edges with large weights, while α regulates the density of connections with larger values of α leading to graphs with denser weight matrices.
Although the method proposed by Dong et al. [24] successfully learns a graph considering a smoothness prior, their proposed optimization (Eq. 1) is difficult due to the many constraints on L and hence it is not scalable [29].Moreover, the Frobenius norm of L is not easily interpretable since L has elements of different scales which are also linearly dependent [29].To address these challenges, Kalofolias et al. [29] proposed not only a fast, scalable, and convergent primal-dual algorithm to solve the method proposed by Dong et al. [24] but also introduced a new effective model for learning a graph.
To this end, first, Kalofolias et al. [29] argued that searching for a valid weighted adjacency matrix W instead of a valid Laplacian L is more intuitive and thus leads to simplified problems.Using the transformations of Table I, it was shown in [29] that it is possible to obtain an equivalent simplified model of Eq. 1 in terms of the weighted adjacency matrix W as where + is an all ones vector with R + denoting the set of all positive real numbers, is the Hadamard product, and Z ∈ R N ×N + is the pairwise distance matrix computed as where x i , x j ∈ R 1×P are the i-th and j-th row in the set of signals X ∈ R N ×P , respectively.This optimization problem is reduced even further when limiting the search space to by considering only the unique elements in a valid weighted adjacency matrix and dealing with the symmetry.[29].
Additionally, Kalofolias et al. [29] proposed a new model that aims at giving a general-purpose model for learning graphs when no prior information is available.To obtain meaningful graphs, the method proposed in [29] ensures that each node has at least one edge with another node by minimizing the following optimization problem: where the logarithmic barrier on the node degree vector forces the degrees to be positive but does not prevent edges from becoming zero, improving the overall connectivity of the graph, without compromising sparsity.The Frobenius norm of W penalizes the formation of big edges but does not penalize smaller ones.
For the optimization of both models, Kalofolias et al. [29] proposed the use of primal-dual techniques that scale as the ones reviewed by Komodakis and Pesquet in [30].More details about the optimization algorithms used to solve these models can be found in [29].
In this paper, we proposed two learning models.The first, which is introduced in the next section, aims at learning the hypergraph Laplacian (TVL-HGSP) by extending the model proposed by Dong et al. [24].The second one, introduced in Section IV, was motivated by Kalofolias et al. [29] and aims at giving a general-purpose scalable model for learning graphs when no prior information is available.

III. HYPERGRAPH LAPLACIAN LEARNING (TVL-HGSP)
We first introduce the necessary background on hypergraph signal processing using t-product factorizations, t-HGSP [13].Then, the proposed algorithm, TVL-HGSP, which learns a hypergraph Laplacian from a set of signals by minimizing their total variation (TV), is introduced.For the rest of this paper, denote vectors by bold lowercase letters (e.g.a), matrices as uppercase letters (e.g.A), and tensors as calligraphic letters (e.g.A), we first define the (i, j)-th tube scalar of the 3rdorder tensor A as a ij which is illustrated in Fig. 3(b).The i-th lateral slice of the tensor, shown in Fig. 3(c), is denoted as A j ≡ A(:, j, :) ∈ R N1×1×N3 which is a vector of tubal scalars.The k-th frontal slice depicted in Fig. 3 and a set of edges E(H) = {e 1 , . . ., e K } whose elements, different from simple graphs, are multi-element subsets of V (H) called hyperedges.Let M = max{|e i | : e i ∈ E(H)} be the maximum cardinality of the hyperedges, shorted as m.c.e(H).A hypergraph H = (V (H), E(H)) with N nodes and M = m.c.e(H) can be represented by an M th-order N -dimensional weighted adjacency tensor A ∈ R N M defined as A = a p1,p2,...,p M , 1 ≤ p 1 , p 2 , . . ., p M ≤ N .For a uniform hypergraph, in which all hyperedges have cardinality M , the entries of A are a p1,...,p M > 0 for any hyperedge e = {v p1 , v p2 , . . ., v p M } and zero otherwhise.Note that a hypergraph with M = 2 degrades to a simple graph with adjacency matrix A ∈ R N ×N ; hence, HGSP is a generalization of GSP.Non-uniform hypergraphs are explained in detail in Appendix F in the supplemental material.The degree of a vertex d(v k ) is the number of hyperedges containing the node v k .Then, the Laplacian tensor is defined as L = D − A where D is the superdiagonal degree tensor with diagonal elements d k,...,k = d(v k ) [12], [31].
In t-HGSP [13], a symmetric adjacency and Laplacian hypergraph tensor descriptors are introduced since the tensors A and L introduced above are not symmetric under the tproduct algebra (Definition 9 in Appendix G in the supplemental material).Therefore, the operator sym(A) generates a symmetric version A s ∈ R N ×N ×(2N +1) of A ∈ R N ×N ×N , by adding a matrix of zeros 0 N ×N as the first frontal slice, dividing by 2, and reflecting the frontal slices of A along the third dimension as T .
For higher-order tensors, at the front, dividing by 2, and reflecting the (M − 1)th-order tensors A (i) along the p-th dimension as When applied to the degree tensor and the Laplacian tensor, we obtain D s and L s , respectively.The hypergraph shift is then given by Y s = F s * X s , where * represents tensor-tensor multiplication (t-product) [14], [32], X s is the hypergraph signal, Y s is the one-time shifted/filtered signal, and F s is the shifting operator that captures the relational dependencies between nodes, including the adjacency tensor A s and the Laplacian L s .A detailed explanation of the t-product is included in Definition 8 in Appendix G in the supplemental material.In GSP, the graph signal is defined as an N -length vector where each signal element is related to one node in the graph.In the proposed framework, given that the shifting operation is defined by the t-product and the shifting operator is a tensor of dimension , the hypergraph signal X s and its one-time shifted signal Y s are both tensors of size N × 1 × N M −2 s to have consistent operations.Thus, a hypergraph signal is related to a tubal scalar which is associated to each node in the hypergraph as shown in Fig. 4(top-right).This setting opens up the possibilities for different hypergraph signal configurations [13].In this paper, we build the signals according to the following definition.
Definition 1 (Hypergraph Signal from a One Dimensional Signal): For a hypergraph with N nodes and m.c.e(H) = M , let X be a (M − 1)th-order N -dimensional tensor computed as the outer product of an original signal in the hypergraph x where each entry position X (i 1 , i 2 , . . ., i M −1 ) equals the product Then, the hypergraph signal is obtained by expanding on a new second dimension, X = expand(X ) where X is an M th-order tensor with dimensions N × 1 × N M −2 and by computing its symmetric version as . Notice that as in the CP-based HGSP framework [12], the hypergraph signal X s in t-HGSP [13] is just another representation of an original one-dimensional signal x that aims at reflecting its properties in different dimensions.For instance, for a hypergraph with M = 3, the hypergraph signal highlights the properties of the 2-D signal components x i x j .As an example, let the adjacency tensor be the shifting operator, i.e.F s = A s ∈ R N ×N ×Ns , and consider the 3uniform hypergraph [13] in Fig. 4(top).The shifted signal in node v 7 is then computed as where, as shown in Fig. 4(bottom), (a s ) 7,2 and (a s ) 7,3 are tubal scalars of the symmetrized adjacency tensor A s that represent the hyperedge e 2 = {v 2 , v 3 , v 7 } and (a s ) 7,5 and (a s ) 7,6 represent the hyperedge e 3 = {v 5 , v 6 , v 7 }, which are the only two hyperedges that contain the node v 7 .
Given the shift operator F s , the eigendecomposition is determined by with the inverse hypergraph Fourier transform given by X s = V * X Fs .Since the tensor V is orthogonal, perfect Fourier representation and recovery of a signal are achieved.
In parallel to GSP theory, the Laplacian-based total variation is also defined in t-HGSP [13] and provides an ordering for the hypergraph Fourier basis of the Laplacian tensor L s .
Hence, we define the Laplacian-based total variation on a hypergraph as Then, the total variation (TV) of a Fourier basis vector V j is Hence, the eigenvectors of the Laplacian shifting operator are ordered from lowest frequency to highest as Filtering and spectral analysis are intimately connected [33].Notably, this fundamental concept has a natural parallel formulation in the tensor-based hypergraph Fourier transform -frequency filtering can be achieved by y Fs (λ l ) = h Fs (λ l ) * x Fs (λ l ) where y Fs (λ l ), s are, respectively, the tubal scalars at frequency λ l of the output signal Y Fs , the input signal X Fs , and the filter response H Fs in the frequency domain.When taking the inverse Fourier transform of Y Fs , the tubal scalars of Y s are given by (y s ) j,1 = N l=1 v j,l * h Fs (λ l ) * x Fs (λ l ), which can be written in tensor-tensor product notation as Y s = H * X s or equivalently as

B. Learning the Laplacian Tensor
Given the t-HGSP definitions [13] and inspired by the method proposed by Dong et al. [24] for learning graphs from data, we formulate the problem of learning a hypergraph topology as follows.Given a set of hypergraph signals X 1 , X 2 , ..., s stores these hypergraph signals concatenated along the second dimension.We infer the hypergraph topology that governs the relationship between the N nodes by minimizing the following optimization problem: where Φ(X s , L s ) is a function that measures the smoothness of the signals X s on the hypergraph, Θ(L s ) is a term that further imposes structure on L s , using prior information such as sparsity, and Ω is the set of valid Laplacian tensors.Considering that the TV on a hypergraph in Eq. 9 measures the smoothness of a signal in the hypergraph, we let where trace AG (•) computes the trace of a tensor and aggregates the resulting tubal scalar as explained in detail in Definition 5 in Appendix A. Similar to the method proposed by Dong et al. [24], we consider Θ(L s ) = L s 2 F which is the Frobenius norm of L s , penalizing the formation of hyperedges with big weights but not the ones with smaller weights.Hence, .Now, given vec(L), the matrix M dup 3s generates the t-symmetric version of L by adding a matrix of zeros 0 N ×N as the first frontal slice (top of matrix M dup 3s are all zeros) followed by the vector-form of the Laplacian tensor vec(L) (given by the big identity in M dup 3s (top-center)) and the reflection of each of the frontal slices of L along the third dimension (given by M dup 3s (bottom)).In M dup 3s , nonzero entries (black) have value 1/2 representing the division by 2 in the t-symmetrization operation.more dense hypergraphs are obtained for bigger values of β.Replacing these two terms in Eq. 11, one way to define an optimization problem to learn a hypergraph from data is given by: argmin which can be further simplified and cast to a convex optimization problem.First, we consider the symmetry of the Laplacian tensor, which means that we only need to solve for the unique elements in L s .We denote the vector-form of the distinct elements in L s as vech(L s ) ∈ R D L and the vector-form of L s as vec As shown in Fig. 5, the vector-form of the distinct elements vech(L s ) can be mapped into the vector-form of the Laplacian tensor vec(L s ) by: where M dup3s and M dups are duplication matrices that account for the symmetry of the super-symmetric tensor L and the t-symmetry of L s , respectively.Second, we consider the connection of the t-product with the Discrete Fourier Transform (DFT).For simplicity, let us consider the case of M = 3.If we let Γ be an N s × N s discrete Fourier transform matrix and Γ −1 = 1 Ns Γ H be the inverse discrete Fourier transform matrix, we can determine vec( Ls ) from vec(L s ) as Now, considering the above operations, we can compute trace AG (•) in terms of vech(L s ) as (16) where C x is computed as in Appendix B. Then, we can rewrite the problem in Eq. 13 as where A and B are the matrices that handle the equality and inequality constraints that guarantee that L s is a valid Laplacian tensor in the set Ω. Same as in [24], the problem in Eq. 17 is a quadratic problem with respect to the variable vech(L s ) subject to linear constraints and can be efficiently solved via interior points methods [34].The computational and space complexity increases with the number of vertices N and the maximum cardinality of the hyperedges M .As point out in [29] for the case of graphs, this optimization problem has two weaknesses.First, using the Frobenius norm on the Laplacian tensor has reduced interpretability since the entries of the Laplacian have different scales.Second, this optimization is difficult due to the many constrains to guarantee that L is a valid Laplacian tensor.As in the case of graphs, a simpler model than that in Eq. 17 is obtained when reformulated in terms of the adjacency tensor as where T L is a linear operator that satisfies vec(L s ) = T L vech(A s ) as explained in detailed in Appendix C.Even though the problem above is simpler, inspired by the model proposed by Kalofolias et al. [29], we propose, in the next chapter, an alternative approach (PDL-HGSP) that improves the connectivity of the learned hypergraph without compromising sparsity and use primal-dual-based algorithms to reduce the computational complexity.

IV. LEARNING THE ADJACENCY TENSOR (PDL-HGSP)
In the same way as before, we are given a set of hypergraph signals .., X P ] and we would like to infer the underlying hypergraph topology.However, for this method, we consider the fact that in simple graphs searching for a valid weighted adjacency matrix W instead of a valid Laplacian L is more intuitive and leads to simplified problems [29].Thus, we formulate the optimization problem in terms of the hypergraph adjacency tensor A s as argmin where Φ(X s , A s ) is a function that measures the smoothness of the signal X s in the hypergraph, Θ(A s ) is a term that further imposes structure on A s using prior information such as sparsity, and Ψ is the set of valid adjacency tensors.
Considering that Φ(X s , A s ) should measure the smoothness of a signal in the hypergraph and motivated by the method introduced in [29], we propose the following pair-wise distance function: where s is a pair-wise distance tensor whose tubal scalars are given by where s are the i-th and j-th row of tubal scalars in X s which are the hypergraph signals associated to the i-th and j-th node, respectively.The element-wise tproduct * , the t-norm • t , aggregate(•), and combine(•) operations are all defined in Appendix A. As before, this function can be computed efficiently in terms of the vectorform of the distinct elements in A s , vech(A s ) ∈ R D A where D A = M i=2 N i .Taking advantage of the symmetry and the Fourier domain connection then where P A is the matrix that considers the symmetry of the adjacency tensor, K N as defined in Eq. 15 applies the DFT along the tubal scalars of the adjacency tensor, and the matrix J z is computed as explained in detail in Appendix D. Different metrics could be used to measure the smoothness of signals on a hypergraph in terms of the Adjacency or the Laplacian tensor.Note that in Eq. 22 the similarity of the nodal observations is determined by their distance while in Eq. 16 this is determined by their correlation.Different similarity measures could be used to measure the smoothness of signals on a hypergraph in terms of the Adjacency or the Laplacian tensor.Additionally, similar to the method proposed by Kalofolias et al. [29], in order to obtain a meaningful hypergraph, we would like to make sure that each node has at least one hyperedge with other nodes and it is also desirable to control the sparsity of the resulting hypergraph.Thus, we let where R is a linear operator that satisfies vech(D s ) = Rvech(A s ) with vech(D s ) ∈ R N being the vector-form of the unique elements of the degree tensor D s , which corresponds to the degree vector.Thus, the logarithmic barrier acting on the node degree vector vech(D s ) forces the degree of each node to be positive but does not prevent individual hyperedges from becoming zero as in [29].The second term, being the Frobenius norm of the adjacency tensor, A s 2 F = vech(A s ) T P T A P A vech(A s ), controls the sparsity by penalizing the formation of hyperedges with large weights but not those with small weights.Then, we can rewrite the problem in Eq. 19 as As in [29], this problem is convex and can be solved by the primal-dual algorithm.Thus, we write the problem as a sum of three functions in order to fit it to primal-dual algorithms reviewed by Komodakis et al. [30]: where f and g are functions for which we can efficiently compute proximal operators, and h is differentiable with a gradient that has Lipschitz constant ζ ∈ (0, +∞).R is a linear operator, so g is defined on the dual variable Rvech(A s ).Appendix E details how the primal-dual algorithm is applied to solve the proposed optimization problem which follows closely the steps followed for the case of simple graphs in [29].
Complexity and Convergence: The proposed PDL-HGSP algorithm has a complexity of O(N (2M +1) ) per iteration, for N nodes and maximum edge cardinality M = m.c.e(H).As in [29], since the objective functions of the proposed model are proper, convex, and lower-semicontinuous, the algorithm is guaranteed to converge to the minimum [30].

V. HYPERGRAPH LEARNING-CONVOLUTIONAL NEURAL NETWORKS (T-HYPERGLNN)
Learning the hypergraph topology can boost the performance of representation learning algorithms as in the case of hypergraph neural networks (HyperGNNs).HyperGNNs are a family of neural networks that unlock higher-order relationships among entities, captured by hypergraphs, together with any available node attributes.Formally, given a shifting operator F and the associated node features X, the goal of HyperGNNs is to identify a representation map Φ(•) between the data X and the target representation t = Φ(X, F, {W}) that takes into account the hypergraph structure F. {W} is the set of weight parameters learned by the model.In order to learn the representation map, we consider a cost function J(•) and a training set Depending on different downstream tasks such as node classification [35], link prediction [36] and hypergraph classification [37], the cost function is chosen accordingly.Notice that the hypergraph structure is a key component in Hyper-GNNS, hence we proposed tensor-based hypergraph learningconvolutional neural networks, coined as t-HyperGLNN, that exploit the learned hypergraph topology to improve the overall performance of recently introduced t-convolution neural networks (T-HyperGNN) [17].The update rule of the proposed t-HyperGLNN is defined as: where s are the input hypergraph signals at the l th and (l + 1) th layer, respectively.W (l) s is a learnable weight tensor that has only the first frontal slice with nonzero elements.The t-product operation A norm s * X s * W s which computes linear weighted sums of neighboring features in the hypergraph is called the hypergraph t-spectral convolution [17].An activation function σ(•) is further applied to the output of the tspectral convolution to model nonlinear relationships.More importantly, different from [17], the normalized adjacency tensor A norm s , here, is learned from data using either the method described in Section II or Section IV.Compared to the most recent HGNNs, the novelties of t-HyperGLNNs are three-fold: (1) Tensor representations of hypergraphs encode polyadic relationships without reducing hypergraphs to graphs.(2) The construction of hypergraph signals captures higher-order interactions among nodes through cross-node multiplications.(3) The hypergraph topology is learned from data, capturing the underlying hypergraph topology and boosting the performance of T-HyperGNN [17], mainly when a hypergraph topology is not readily available.

VI. EXPERIMENTS
We evaluate the performance of the proposed hypergraph learning algorithms.First, we consider the case in which the Hyperedges are color-coded: (green) predicted hyperedges that are also in the GT hypergraph, (red) predicted hyperedges that are not in GT, and (blue) hyperedges in the GT that were not predicted.Note how the amount of miss-predicted hyperedges decreases significantly for the proposed PDL-HGSP model.ground truth hypergraph is known.Second, when the ground truth hypergraph is unknown, we measure the performance of different models on spectral clustering, whose results solely depend on the hypergraph quality.Lastly, we demonstrate the benefits of the proposed approach in a critical real-world application.

A. Recovery of real-world hypergraphs
In this experiment, we test both proposed algorithms, TVL-HGSP and PDL-HGSP, by recovering the ground truth hypergraph from a set of smooth signals.The ground truth hypergraphs are subsets of real-word uniform and non-uniform co-authorship networks, Cora and DBLP, where a node is a paper and a hyperedge is formed if a collection of papers are written by the same author [39].The statistical description of the hypergraphs used for this experiment are summarized in Table II.Even though each node has features that are provided by a bag-of-words model, these signals are not necessarily smooth on the co-authorship hypergraphs.Consequently, in order to recover the ground truth co-authorship hypergraph from smooth hypergraph signals, we first generate signals that are smooth on this hypergraph.From a set of Gaussian i.i.d one-dimensional signals (concatenated column-wise) .., X P ] according to Definition 1.We normalize the Laplacian tensor such that L norm s = V * Λ norm * V −1 where Λ norm is a normalized diagonal tensor.The k-th frontal slice of Λ norm is computed in the Fourier domain as Λnorm using t-HGSP tools, we filter each of the hypergraph signals according to the Tikhonov filter h Fs (λ i ) = (1 + αλ i ) −1 .For  all the cases, we used 100 signals (P = 100) and performed grid search to find the best parameters for each model.Given that a feasible solution to recover the hypergraph topology cannot always be found on prior work on tensor-based hypergraph learning [16], we choose a recently introduced matrixbased approach, dubbed as GroupNet [38], as baseline.In GroupNet [38], a hypergraph is learned from node signals for a downstream task, based on the assumption that each node contributes to at least one hyperedge whose internal nodes are highly correlated.Since we know the ground truth hypergraph, the metrics of model performance that we use are precision, given by the fraction of relevant hyperedges (i.e.those in the ground truth) among the retrieved hyperedges; recall, given by the fraction of relevant hyperedges that were retrieved; and the f-measure that is the harmonic mean of edge precision and recall.
Table III summarizes the results for all the different hy-pergraphs.Note that for non-uniform hypergraphs, we also include the f-measure per set of hyperedges with the same cardinality.Additionally, in Fig. 7, we visually compared the results for hypergraph DBLP (UN).We can see that our proposed scalable algorithm, PDL-HGSP, outperforms the other models in all the cases, while TVL-HGSP and GroupNet [38] have similar performance.This not only demonstrates the effectiveness of the proposed approaches but also the benefits of the logarithmic barrier method in PDL-HGSP.

B. Unsupervised Wound Image Segmentation
We also measure the performance of different hypergraph generation models on spectral clustering, in which results solely depend on the hypergraph quality.Note that in this case, the actual ground truth hypergraph is unknown.To this end, we consider the application of segmentation of wounds which is an important topic in computer vision and health science.Accurate measurement of a wound area is critical to evaluating and managing chronic wounds to monitor the wound healing trajectory and determine future interventions.However, manual measurement is time-consuming and often inaccurate.Hence, wound segmentation from images is a desirable solution to these problems that not only automates the wound area measurement but also allows efficient data entry into the electronic medical record of the patient [40].Several deep-learning methods have been recently proposed for wound segmentation.However, these models require a lot of densely annotated images which can be expensive, for the need of wound professionals, and are error-prone (induced by labeling fatigue).Thus, we proposed unsupervised and weakly-supervised algorithms as an alternative approach with lower data requirements.
The data used in this experiment was fully annotated by wound professionals in collaboration with the Advancing the Zenith of Healthcare (AZH) Wound and Vascular Center, Milwaukee, WI [40].As depicted in Fig. 8, the wound input image is first segmented into super-pixels by the SLIC method [41].Each super-pixel represents an homogeneous region from the image and a node in the hypergraph.As in [42], the features of each node are obtained from VGG16 [43] which is a pretrained Convolutional Neural Network (CNN).Particularly, for this experiment we use the features maps from the 5th layer.From the RGB and VGG channels, the pooling feature extractor block, computes the mean, variance, asymmetry, and frequency [44].Additionally, the centroids of each superpixel were considered, yielding a total of P = 642 features.In this experiment, we used these features to generate different hypergraphs and apply hypergraphs spectral clustering to segment the wound image as shown in Fig. 8.
For comparison, we consider the method proposed by Li et al. [45] and Ahn et al. [46] on hypergraph spectral clustering (HSC) which is based on the eigenspace of what they called the hypergraph processed similarity matrix.We also compare our approaches to CP-based and t-HGSP hypergraph spectral clustering using the hypergraph Fourier space given by the symmetric orthogonal CP decomposition [12] and the teigendecomposition [13], respectively.For these methods, hypergraphs are not learned but instead, they are obtained by the image adaptive neighborhood hypergraph (IANH) model [47].Additionally, we consider the prior work on tensor-based hypergraph learning [16] (CP-learn), and apply spectral clustering on the estimated hypergraph eigenvectors from data.Given that ground truth is available for the segmentation task, we used traditional classification metrics accuracy, F-measure (F1), precision, and recall.Results are depicted in Fig. 9 where we can see that both proposed methods PDL-HGSP and TVL-HGSP have similar performance and outperform prior art algorithms.

C. Hypergraph Learning-Convolutional Networks on Coauthorship Networks
In this experiment, we used a semi-supervised node classification task to demonstrate the benefits of learning the underlying hypergraph topology from data in representation learning applications which is the inspiration of the proposed hypergraph learning-convolutional neural networks (t-HyperGLNN).As before, we use different a subset of the co-authorship networks, Cora [39].The node features associated with each paper are the bag-of-words representations summarized from the abstract of each paper, the node labels are classes of papers (e.g., algorithm, computing).Unlike before, here, we exploit both the natural hypergraph and the node feature such that the final hypergraph combines both the natural hypergraph and the hypergraph learned from the feature signals.We compare the performance against tconvolution neural networks (T-HyperGNN) [17] that only use the natural co-authorship hypergraph.We randomly split nodes into 80% training and 20% testing percentages.Note that this experiment is also an ablation study that examines the effect of using a learned hypergraph from data in representation learning applications which is the key component of the proposed hypergraph learning-convolutional neural networks (t-HyperGLNN).For the this and the following experiments, we only consider, PDL-HGSP, given its performance and scalability.We used accuracy as the comparison metric and summarize the results in Table IV.Clearly, combining both the natural hypergraph and the learned hypergraph gives the best performance.However, a natural hypergraph is not always available which is the case of the next experiment.

D. Hypergraph Learning-Convolutional Networks for Weaklysupervised Wound Segmentation
Next, we revisit the wound segmentation application in Section VI-B.However, in this case, we consider a weaklysupervised approach in which clicks on the image are available as weak signals as shown in Fig. 8.To this end, we randomly sample 10% of the super-pixels of each input image for which we know the label.Note this task reduces to the same semisupervised node classification task as in Section VI-C.Given that a natural hypergraph is not known, we used the proposed algorithm, PDL-HGSP, to learn the hypergraph from the features X.For comparison, we also consider the hypergraphs generated by GroupNet [38] and the IANH model [47].We split the data into training (40%), validation (20%), and testing (40%).We used the fully-labeled images in the validation set to tune the parameters in the PDL-HGSP approach and in the IANH model [47].Results on the testing set, summarized in Table V, show that the proposed method, PDL-HGSP, outperforms the state-of-the-art approaches.The trained t-HyperGLNN could be further used to generate pseudo masks and a segmentation network then could be trained supervised by the generated pseudo annotations [48].

VII. CONCLUSIONS
Two novel tensor-based hypergraph learning algorithms were proposed under the umbrella of t-HGSP.Particularly, the proposed method, PDL-HGSP, was demonstrated to outperform state-of-the-art algorithms while providing more scalability than TVL-HGSP.Additionally, we proposed hypergraph learning-convolutional neural networks (t-HyperGLNN), which combined the learned hypergraphs with the recently proposed tensor-hypergraph convolutional neural networks (t-HyperGNN).We demonstrated the potential of this work in critical applications such as wound segmentation.Given the proposed algorithms, many more opportunities emerge.Our future work is focused on further improving the scalability of the proposed algorithms and exploring some of the myriad possible applications.APPENDIX A TENSOR-PRODUCT DEFINITIONS Definition 2 (t-norm [49]): The t-norm of a vector of tubal scalars X ∈ R N1×1×N3 is a 1 × 1 × N 3 tubal scalar and it can be computed as Definition 3 (Trace of a Tensor): In this paper, we define the trace of a 3rd-order tensor A ∈ R N1×N1×N3 , denoted as trace(A), as a 1×1×N 3 tubal scalar and it can be computed as trace(A) = N1 i=1 a i,i , where a i,i ∈ R 1×1×N3 is the i-th tubal scalar along the diagonal of A. Alternatively, by considering the connection with the Discrete Fourier transform: where Â(k) ∈ R N1×N1 is the k-th frontal slice of Â = fft( A, [], 3) and tr() represents the traditional trace of a matrix.Definition 4 (Aggregation of a Tubal Scalar): This operation aggregates the elements of tubal scalar t ∈ R 1×1×N3 through addition as T = aggregate(t) = N3 k=1 t (k) , such that T ∈ R.
Definition 5 (Trace-Aggregation of a Tensor): By combining the above definitions, we compute the trace-aggregation operation of a 3rd-order tensor A ∈ R N1×N1×N3 as a scalar, denoted as trace AG (A).This operation aggregates the resulting trace tubal scalar as trace AG (A) = aggregate(trace(A)). ( Definition 6 (Element-wise t-product): The element-wise t-product of A ∈ R N1×N2×N3 and B ∈ R N1×N2×N3 is computed as the element-wise matrix multiplication of each pair of frontal slices in the Fourier Domain as where is the Hadamard product also known as the elementwise product.
Definition 7 (Combination of Tubal Scalars): This operation combines all the tubal scalars in a tensor.The combination of a 3rd-order tensor A ∈ R N1×N2×N3 , denoted as combine(A), is a tubal scalar t ∈ R 1×1×N3 obtained as APPENDIX B DETAILED EXPLANATION OF EQ. 16 where vec(t) and vec( t) is the vector form of the tubal scalar t ∈ R 1×1×Ns and its DFT, t ∈ R 1×1×Ns , respectively, and whose frontal slices are given by Then, following the properties of the trace, tr(X being the k-th frontal slice of Ls , can be computed in terms of vech(L s ) as vec where S (k) ∈ R N 2 ×N 2 Ns is a selection matrix that keeps only the k-th frontal slice from vec( Ls ).Then, by grouping slice operations, we define a new matrix C x ∈ R Ns×N 2 Ns computed as T s T S (1)   . . .
such that vec( t) = C x K N P L vech(L s ).
APPENDIX C DETAILED EXPLANATION OF EQ. 18 where vec(L s ) can be determined in terms of vech(A s ) as: where R is a linear operator that satisfies vech(D s ) = Rvech(A s ) with vech(D s ) ∈ R N being the vector-form of the unique elements of D s , P D and P A are matrices that consider the symmetry of the degree and the adjacency tensor, respectively.Then, by substituting 36 in Eq. 35: Similarly, for the Frobenius norm of the Laplacian: = fold A (1)  • • • A (3) . . . . . . . . . . . .
where the operator bcirc(A) converts the set of frontal slices of the tensor A into a block circulant matrix and unfold(B) stacks vertically the set of frontal slices of B into a N 2 N 3 × N 4 matrix.The operator fold() reverses this process, fold(unfold(A)) = A. Circulant matrices are diagonalized by the discrete Fourier transform; hence, the tproduct can be computed efficiently in the Fourier domain as explained in [32].Using MATLAB notation, let Â := fft(A, [], 3) denote the tensor obtained by applying the fast Fourier transform (FFT) along each tubal element of A. For the remainder of this paper, the hat notation refers to a tensor that has gone through this operation.Thus, the t-product of A ∈ R N1×N2×N3 and B ∈ R N2×N4×N3 can also be computed by matrix multiplication of each pair of frontal slices in the Fourier Domain as The t-product can be easily extended to high-order tensors in a recursive manner.
Thus, each successive t-product operation involves tensors of one order less and at the base label of recursion, there is a t-product of 3rd-order tensors.
Given that the extension to high-order tensors is obtained by recursion, for simplicity and without loss of generality, we present the following definitions for the base case.
Definition 9 (Transpose and Symmetric tensors): The transpose of a 3rd-order tensor A ∈ R N1×N2×N3 , denoted as A T , is the tensor obtained by transposing each of the frontal slices and then reversing the order of the transposed frontal slices 2 through N 3 .For a higher-order tensor A ∈ R N1×N2×•••×Np , its transpose A T ∈ R N2×N1×•••×Np is obtained by recursively transposing each A (l) for l = 1, 2, . . ., N p and then reversing the order of the A (l) s from l = 2 to l = N p as T A (Np) T . . .
The tensor A is symmetric if A = A T [32].

Fig. 1 .
Fig. 1.In a co-authorship network, two different hypergraphs (b) H 1 (c) H 2 are mapped to the same simple graph in (a) by the clique expansion.Hyperedges are color-coded by publication, e.g. the red hyperedge (e 3 ) indicates that Carl, Dan, and Ed coauthored a publication.

Fig. 2 .
Fig. 2. (a) As input, we only have a set of hypergraph signals associated with each node or entity in a network whose topology is unknown.Once these signals are fed to the hypergraph learning algorithm, the underlying (b) hypergraph topology represented by the adjacency tensor A is unveiled.The adjacency tensor A in (b) hence captures polyadic relationships from the data (a).

Fig. 5 .
Fig. 5. Given the vector vech(Ls) ∈ R D L (top-right) which contains only the distinct elements in Ls, the binary matrix M dups (nonzero entries in black) replicates its entries to build the super-symmetric tensor L ∈ R N ×N ×N (bottom-left) whose vector form is denoted as vec(L) ∈ R N 3. Now, given vec(L), the matrix M dup 3s generates the t-symmetric version of L by adding a matrix of zeros 0 N ×N as the first frontal slice (top of matrix M dup 3s are all zeros) followed by the vector-form of the Laplacian tensor vec(L) (given by the big identity in M dup 3s (top-center)) and the reflection of each of the frontal slices of L along the third dimension (given by M dup 3s (bottom)).In M dup 3s , nonzero entries (black) have value 1/2 representing the division by 2 in the t-symmetrization operation.

Fig. 6 .
Fig. 6. (Top) The vector-form of the Laplacian tensor Ls in the Fourier domain, vec( Ls), is determined by the multiplication of the vector-form of the Laplacian tensor vec(Ls) and the matrix K N = M t2f Γ N 2 M f 2t .Note that Γ N 2 is a block diagonal matrix with Γ repeated N 2 -times along the diagonal.(Bottom) The operator M f 2t transforms a vectorized tensor organized in frontal slices (which are colored-coded) to a vectorized tensor organized in scalar tubes.M t2f reverses this operation.
K N vec(L s ) = vec( Ls ), where, as depicted in Fig 6, Γ N 2 is a block diagonal matrix with Γ repeated N 2 -times along the diagonal and M f 2t transforms a vector organized in frontal slices to a vector organized in scalar tubes and M t2f works in the reverse direction as shown in Fig 6(bottom).

Fig. 7 .
Fig.7.Recovery of the ground truth hypergraph, DBLP (UN), from a set of smooth signals, using: (a) GroupNet[38], (b) TVL-HGSP, and (c) PDL-HGSP.Hyperedges are color-coded: (green) predicted hyperedges that are also in the GT hypergraph, (red) predicted hyperedges that are not in GT, and (blue) hyperedges in the GT that were not predicted.Note how the amount of miss-predicted hyperedges decreases significantly for the proposed PDL-HGSP model.

Fig. 8 .
Fig. 8. Pipeline for wound segmentation using both unsupervised (Experiment VI-B) and weakly-supervised algorithms (Experiment VI-D).We used SLIC super-pixel segmentation, VGG16, and pooling feature extraction to obtained the signals in X from which we learn a hypergraph.Then, we apply either clustering (unsupervised) or HyperGNNS (weakly-supervised) to segment the wound in the input image.Note that for HyperGNNS the input image is weakly labeled through clicks.

Fig. 9 .
Fig. 9. Performance comparison of different hypergraph spectral clustering methods on wound segmentation.