On Edge Reweighting for Link Prediction with Graph Auto-Encoders

During the last five years, graph auto-encoders became popular unsupervised methods, based on graph neural networks, to learn a node embedding from a graph. Researchers train graph auto-encoders by optimizing reconstruction losses that are computed from the connected node pairs (the edges) and non-connected node pairs of the graph. Many graphs being sparse, researchers often positively reweight the edges in these reconstruction losses. In this paper, we report an analysis of the effect of edge reweighting on the node embedding. We show that, on a link prediction problem, results are quite insensitive to edge reweighting, with the exception of very unbalanced reconstruction losses. We also discuss whether training models from perfectly balanced reconstruction losses is optimal or suboptimal, in terms of average scores and of standard deviations.


I. INTRODUCTION
The research activity related to the development of machine learning methods for graph (a.k.a. network) data has grown at a fast pace over the past few years [1]. It became one the most active and intriguing sub-areas of deep learning [2,3,4]. Among others, several researchers constructed different graph neural network [3,5,6] architectures to learn node embeddings [1,2,3,4]. A node embedding is a vector space learnt from a graph neural network (in general), in which the nodes from a given graph are represented by some vectors. The similar nodes in the graph will have close vectors in the space. Working with a node embedding instead of working with a graph linking the nodes can be useful to solve machine learning problems involving the nodes [1,2,3,4,6,7]. However, a majority of these graph neural networks must be trained in a supervised way. Indeed, researchers will often update the parameters of these neural networks by minimizing a loss that involves labels on each node or on a subset of nodes [1,3,6]. This is limiting, as such labels are sometimes unavailable.
During the last five years, graph auto-encoders [7] became popular methods extending graph neural networks to learn a node embedding but in an unsupervised way. Instead of dealing with node labels, graph autoencoders optimize a reconstruction loss that must be computed from the connected node pairs (the edges) and non-connected node pairs of the graph. The loss will decrease if the graph auto-encoder can correctly predict, using the node embedding, which node pairs are connected and not connected in the original graph. In other words, we assess the quality of the embedding by check if, starting from this space, it is possible to output a reconstructed graph that is quite similar to the true data. Graph auto-encoders (and their derivatives, as graph variational autoencoders [7]) have been recently used to deal with a wide range of research problems. Some famous examples are: link prediction [7,8,9,10,11], node clustering [12,13,14] and generating some small graphs such as molecules [15,16,17]. Some papers also showed that graph auto-encoders give interesting results for very large graphs with several millions of nodes [18,19].
Many graphs being sparse, researchers almost always reweight the edges in the reconstruction loss, with respect to the non-connected node pairs which are more numerous. Most codes simply set a positive edge reweighting scalar parameter in the reconstruction loss. A deep experimental analysis of the effect of this reweighing on the node embedding is missing. In this paper, we conduct and report the results of such an analysis. For experiments, we focus on the usage of the node embedding vectors to address the link prediction task [20], because this is the most common task to assess the quality of graph auto-encoders. Our analysis shows that the link prediction results are quite insensitive to unbalanced reconstruction losses, with the exception of extreme cases. Our analysis also tends to show the interest of keeping a quite balanced loss as well as the interest of slightly overweighting edges with respect to non-connected node pairs, on some of our graphs.

II. METHODS
We have an undirected graph = ( , ) with | | nodes and | | edges. We summarize by the adjacency matrix of dimension | | × | | which is defined as: = 1 if the node pair ( , ) is connected by an edge (( , ) ∈ ) and = 0 otherwise. We want to give to each node a vector of dimension < | | in a node embedding space ℨ. Also, in next paragraphs, will be the matrix of size | | × inside which the row n° will correspond to the vector .
A graph auto-encoder [7] is an unsupervised model with an encoder and a decoder. The encoder is a parametrized function ; in modern research this function will be a graph neural network [3,5,6]. Its input is and its output is . In our experiments, we will follow the first works of Kipf and Welling [6,7] and our encoder will thus be a 2-layer graph convolutional network: • ̃ the normalized version of the adjacency : ̃= −1/2 ( + | | ) −1/2 , with the degree matrix of an × ℎ weight matrix and 1 another ℎ × weight matrix (to learn). Kipf and Welling [7] set ℎ = 32 and = 16, and we will choose similar dimensions in our experiments. Once the matrix (the node embedding vectors) is computed, the decoder will reconstruct an approximate version of , named ̂ and of dimension | | × | |, as follows: Intuitively, the node embedding vectors in will be of "good quality" if the reconstructed matrix ̂ is equal or very close to the true initial data . So, researchers train the graph auto-encoder by gradient descent to minimize a reconstruction loss [7]: Kipf and Welling [7] set: We note the presence of an edge reweighting scalar parameter (usually > 1) in ( ,̂). Indeed, many graphs being sparse, researchers felt the need to positively reweight the edges in the reconstruction losses, with respect to the non-connected node pairs which are more numerous (| × \ | vs | |). For instance, Kipf and Welling [7] and this choice ensures that the "positive" (edges) and "negative" (non-connected node pairs) parts of the reconstruction loss have the same relative importance.
However, the choice of and the effect of edge reweighting on the node embedding has never been deeply studied. Existing research set the value of in one line of code, without further study. In the next two sections of the paper, we conduct and report the empirical results of such a study, focusing on link prediction applications. Our objective is to assess the effect of setting: for different values of the scalar parameter ∈ ℝ + . Setting = 1 leads to an artificially balanced reconstruction loss (as before), while setting < 1 will underweight edges with respect to non-connected node pairs and setting > 1 will overweight edges with respect to non-connected node pairs.

III. EXPERIMENTAL DESIGN
For evaluation, we follow the experimental procedure of previous papers [7,8,9,10,11,18,19] and do link prediction. The goal is to assess, for different values of , our performance at predicting if two nodes and from the original graph are connected by an edge ( , ) or not, only by using the learnt node embedding vectors and and the associated reconstructed cell ̂. We will report results on the three citation networks Cora, Citeseer and Pubmed (we refer to [6] for more details), whose statistics are available in Table 1. These three graphs are very relevant for our study, because they are commonly used, and because they are very sparse and thus will all require the tuning of an edge reweighting scalar parameter to return good results. More precisely we will train several graph auto-encoders on some masked versions of the original graph data, with only 85% of edges. Among the missing edges, 5% are put in a validation set and 10% and put in a test set, together with the same number of non-connected pairs of nodes (selected randomly). These numbers are on par with previous papers. Are we able to retrieve the missing edges in the test set? This is actually a classification problem, that we will evaluate using the following two metrics: • AUC: Area Under the receiver operating characteristic Curve [21].
• AP: Average Precision score [22]. We chose to train all the models using the ADAM algorithm [23] with a learning rate of 0.01, for 200 epochs of training and with ℎ = 32 and = 16 (as explained in II). For the Pubmed graph, which is larger than Cora and Citeseer, we used the FastGAE code from [19] for faster evaluations. Table 2 shows our results on the three graphs Cora, Citeseer and Pubmed. All the AUC and AP scores are in percentage and are averaged over 20 trainings of the graph auto-encoder model, and we also present the corresponding standard deviations over all these different trainings (to account for the volatility due to the randomness in edge masking). We tested a wide range of values for the parameter . As explained in II, setting = 1 is equivalent to a balanced reconstruction loss, whereas setting < 1 will underweight edges with respect to non-connected node pairs and setting > 1 will overweight edges with respect to non-connected node pairs. Foremost, we see in Table 2 that the Area Under the ROC Curve and the Average Precision link prediction scores on the test sets are quite insensitive to the choice of in the graph auto-encoder reconstruction loss, with the exception of extreme values ( = 0.001, 0.01, 100 or 1 000). For all other values of we reach scores that are quite close to the scores of the balanced reconstruction loss with = 1. Another result from our experimental analysis is that: fine-tuning (particularly to oversample the edges with respect to the non-connected node pairs) can sometimes very slightly improve the results. In Table 2, for the Cora graph and for the Citeseer graph, choosing = 1.25 and = 1.05, respectively, is optimal. Ultimately, we know that the standard deviations of our studies are quite large and that differences are not necessarily expressive ; yet, we see that selecting around 1 decreases the scores volatilities with respect to very unbalanced reconstruction losses. Future works on larger graphs could be necessary to really confirm our results on the advantage of > 1.

V. CONCLUSION
As many graphs are sparse, researchers often positively reweight the edges of their reconstruction losses when training graph auto-encoders models, with graph neural network encoders and inner product decoders. However, a deep experimental analysis of the effect of this reweighing on the model was missing. In this paper, we reported and commented the results of such an analysis. We focused on the usage of graph auto-encoders for link prediction on three popular citation networks. We showed that the link prediction performances are quite insensitive to unbalanced reconstruction losses, with the exception of extreme values. We also explained the potential interest of keeping a quite balanced loss as well as slightly overweighting edges with respect to nonconnected node pairs, in terms of optimal scores and of reduced standard deviations. Future studies will try to confirm our results on variants of graph auto-encoders, such as graph variational auto-encoders (preliminary experiments are conclusive) as well as on different graph datasets.