A Multiobjective Evolutionary Approach for Solving Large-Scale Network Reconstruction Problems via Logistic Principal Component Analysis

Currently, the problem of uncovering complex network structure and dynamics from time series is prominent in many fields. Despite the recent progress in this area, reconstructing large-scale networks from limited data remains a tough problem. Existing works treat connections of nodes as continuous values, leaving a challenge of setting a proper cut-off value to distinguish whether the connections exist or not. Besides, their performances on large-scale networks are far from satisfactory. Considering the reconstruction error and sparsity as two objectives, this article proposes a subspace learning-based evolutionary multiobjective network reconstruction algorithm, called SLEMO-NR, to solve the aforementioned problems. In the evolutionary process, we assume that binary-coded individuals obey the Bernoulli distribution and can use the probability and natural parameter as alternative representations. Moreover, our approach utilizes the logistic principal component analysis (LPCA) to learn a subspace containing the features of the network structure. The offspring solutions are generated in the learned subspace and then can be mapped back to the original space via LPCA. Benefitting from the alternative representations, a preference-based local search operator (PLSO) is proposed to concentrate on finding solutions approximate to the true sparsity. The experimental results on synthetic networks and six real-world networks demonstrate that, due to the well-learned network structure subspace and the preference-based strategy, our approach is effective in reconstructing large-scale networks compared to six existing methods.


I. INTRODUCTION
T HE study of complex networks is common in many fields. There exist unifying principles underlying the topology of networks, which are helpful to control collective dynamics [1]. However, directly accessing network structures and nodal dynamics is impossible in many cases, whereas only limited observable data are available [2]. It is thus imperative to propose network reconstruction approaches to uncover network structures from data extracted from observations or experiments. Potential applications exist in the fields of gene regulatory networks [3]- [5], catastrophes prediction [6], and brain function mapping [7].
The network reconstruction problem, the inverse problem, has become a heating topic in contemporary network science and engineering [8], [9]. Some works focus on reconstructing multilayer networks [10], [11], while in this paper we only consider the networks with a single layer. Since the structural information of the network is hidden under measurement data and the decision space is of extremely high dimension, accurate reconstruction of complex networks from limited data is challenging, especially for large-scale networks with hundreds and thousands of nodes. Usually, the network reconstruction problem (NRP) [12] is converted into sparse signal reconstruction problems. To recover the original signal, one needs to consider the following sparse optimization problem: where X in R N is the neighboring vector, Y is an observation vector of length M in R M , and A is a sensing matrix in R M ×N . In most cases, the observation data are limited, which means M N . Generally, X needs to be solved from known Y and A. In NRPs, x ij ∈ {0, 1} is the element in X, where 0 and 1 represent connection and disconnection of nodes i and j, respectively. The objective function X 0 is the L 0 -norm of X, which is defined as the number of nonzero elements in X. Problem (1) is usually converted into an unconstrained optimization problem with the following form: where the L 2 -norm Y −AX 2 2 is the loss term, X 0 is the regularization term, and λ is a regularization parameter with a positive value. Solving the equation above has been proven to be an NP-hard problem [13].
So far, a lot of approaches have been proposed to solve small-scale NRPs, where the regularization term in (2) is replaced by X 1 in most cases [14], [15]. Napoletani et al. [16] proposed a constrained optimization technique based on chaotic time-series analysis to reconstruct nonlinear dynamical networks, which appeared to be limited to small-scale networks with sparse connections. Han et al. [17] pointed out that the vectors to be reconstructed were typically sparse due to the natural sparsity of realistic nonlinear dynamical networks and thus could be solved by a convex optimization algorithm named least absolute shrinkage and selection operator (LASSO) [18]. Wang et al. [19] came up with an efficient approach to reconstructing complex networks from evolutionary-game data via compressive sensing (CS) [20]. Compared to those associated with conventional reconstruction problems, the observation data requirements were relaxed considerably. Shen et al. [21] developed a CS-based framework to reconstruct complex networks with stochastic spreading dynamics from small amounts of polarized data, and this framework could be used to identify hidden sources. Unlike the above-mentioned approaches based on machine learning (ML), evolutionary algorithms (EAs) [22] are also introduced to solve NRPs. Wu et al. [23] decomposed NRP into a multiobjective optimization problem (MOP) and adopted the EA with LASSO initialization to solve it. To improve the accuracy of the reconstruction, Wu et al. [24] further utilized a memetic algorithm (MA) [25] to solve the non-convex problem (2) directly.
Despite the success in modeling and reconstructing complex networks, there are still two major limitations in previous approaches. First, for the unweighted networks, it is a natural idea to use binary-coded variables to represent the existence and inexistence of edges. However, no matter the approaches are ML-based or EA-based, they treat neighboring vector X as a real-coded variable. Utilizing the real-coded variable needs to face an inevitable problem of setting a cut-off value to distinguish the existent links from null connections. The elements with values smaller than the cut-off value correspond to zeros in the adjacency matrix, whereas those with values larger than the cut-off value are treated as corresponding to actual links. However, the reconstructed values are dispersed, leading to ambiguities in the identification of links. For example, if a solution X i = (0.69, 0.27, 0.95, 0.03, 0.52, 0.38) is finally obtained and the true structure is X i,true = (1, 0, 1, 0, 0, 0), then the cut-off value being 0.3, 0.5 or 0.7 will make a great difference. It is thus challenging to set a proper cut-off value to maximize the reconstruction performance. Moreover, due to the large search space, most previous approaches can only cope with small-scale NRPs with dozens of nodes. In addition to nodal dynamics information, the computational cost tends to increase dramatically with the size of networks [26]- [28]. For large-scale NRPs with more than 400 nodes and thousands of edges to be determined, the reconstruction accuracy is far from satisfactory.
To overcome these shortcomings, we develop an approach to reconstructing large-scale networks from time series by a subspace learning based evolutionary multiobjective network reconstruction algorithm, termed as SLEMO-NR. So far, the multiobjective evolutionary algorithm (MOEA) has been successfully used to solve challenging optimization problems in ML [29]- [31] where the regularization term is converted into another objective function. In this way, MOEA avoids the choice of the regularization parameter in (2). However, existing MOEAs still can not handle large-scale NRPs properly. Specifically, the main contributions of our proposed method are summarized as follows, 1) Different from existing methods, SLEMO-NR uses binary-coded individuals to prevent the choice of a proper cut-off value, thus improving the ultimate performance. Thereafter, we propose to use the natural parameters and probability distribution as two alternative representations of binary-coded individuals. In the evolutionary process, the algorithm turns to search for the probability of the existence of an edge instead of searching for the existence directly. The natural parameters carrying the probability distribution information are engaged in generating new solutions instead of original binary-coded individuals.
In the end, the newly generated natural parameters are transformed into binary offspring individuals. 2) For large-scale NRPs, the search space grows exponentially with the number of decision variables, causing stiff challenges for EAs to precisely and efficiently approximate the network structure [32]. Logistic principal component analysis (LPCA) [33], as an effective approach to extracting features from Bernoulli distributed data, is used to learn a subspace containing the features of network structure. In this way, SLEMO-NR is capable of searching for optimal solutions in the subspace, thus alleviating the impact of "curse of dimensionality". 3) In MOEAs, we can get a set of non-dominated solutions, called Pareto solutions. However, for NRPs, our ultimate goal is to find a so-called k-sparse solution, where k is the true sparsity in real networks. Benefitting from the alternative representations, a preference-based local search operator (PLSO) is proposed in SLEMO-NR to find a k-sparse solution, and no prior estimation of sparsity is required. To validate the performance of SLEMO-NR, the evolutionary game (EG) [34] and resistor network (RN) [35] are employed as network models. At first, 16 problems based on four kinds of benchmark networks with different configurations are designed to evaluate the effectiveness of subspace learning and PLSO. After that, four representative problems are selected to analyze the sensitivity of parameters. In the end, extensive experiments on six real-world networks demonstrate that SLEMO-NR is superior to 6 state-of-the-art approaches in large-scale NRPs.
The rest of this paper is organized as follows. Section II gives basic formulations of the network reconstruction model and introduces two network models. Our proposed SLEMO-NR is described in detail in Section III. Section IV presents the experiments and comparisons with 6 existing methods. Finally, conclusions are given in Section V.

A. Network Reconstruction Formulation
In a network, the interactions among N nodes are characterized as an N × N adjacent matrix. Particularly, reconstructing the whole network can be decomposed into learning local connections of nodes individually [17], [19], [24]. Therefore, it is a natural idea to decompose the entire problem into N independent subproblems. Besides, a node will not connect to itself, so we exclude the self-loops of the network. In a time period of length M , the states of nodes are recorded as A net = [A 1 , A 2 , . . . , A N ] and the outcomes of node interactions are recorded as Y net = [Y 1 , Y 2 , . . . , Y N ], where the ith slice A i ∈ R M ×(N −1) and Y i ∈ R M are the time series of node i. For the ith subproblem, we have the following formulation: where X i ∈ R (N −1) is the connections of node i with other nodes and our goal is to reconstruct X i from the recorded time series A i and Y i . Considering the measurement error and the sparsity of variables as two objectives, the above subproblem can be transformed into an MOP: B. Network Models 1) Evolutionary Game Model: Evolutionary game dynamics is a kind of game theory applying population dynamical methods, which is commonly used to model node-to-node interactions in various complex systems [36]. In each round of an EG, players have one of the two strategies (S) to choose: cooperation (Co) or defection (De), expressed as S(Co) = (1, 0) T and S(De) = (0, 1) T . The payoffs of two players are determined by the strategies they choose and the payoff matrix of the specific game they are playing. The prisoner's dilemma game (PDG) [37], a representative game model, is adopted in this paper. Its payoff matrix is described as: where b ∈ (1, 2) is a parameter representing the temptation to defect. On the condition that both two players choose to cooperate (or defect), their payoffs are 1 (or 0). Whereas in the remaining cases where two players have different strategies, the defector gains payoff b but the cooperator gains the "sucker" payoff 0. In this paper, b is set to 1.2 [38]. For player i, its neighboring vector X i stands for the connections with other players, with elements x ij = 1 if players i and j are connected and x ij = 0 otherwise. At each round of the game, all players play the game with their neighbors. For player i at time step t, its payoff Y i (t) is denoted as: where S i (t) is the strategy of player i. After each round of the game, a player will adjust its strategy according to both its own and its neighbors' payoffs, with the purpose of maximizing its payoff at the next round. In our simulations of the EG model, the Fermi rule is adopted to update the strategy, which is defined as follows: where κ is a parameter characterizing the stochastic uncertainties in EG dynamics. According to (6), the NRP can be formulated as: It should be noted that Y i is derived from the payoff data, A i is obtained through the calculation of strategy data, and X i containing all possible connections need to be reconstructed. The matrix A i and vectors Y i , X i satisfy (3).
The numerical simulation process of EG is described as follows. At the beginning of the game, all players choose to cooperate or defend. At each round of the game, the payoff of each player is calculated according to (6). All the payoffs and strategies in the process are recorded as time series to reconstruct the network.
2) Resistor network model: Resistor network dynamics is a kind of circuit system considering current transportation in resistors [35]. For two nodes i and j, the resistance of a resistor between them is denoted as R ij . If i and j are directly connected by a resistor, R ij = 1; otherwise, R ij = ∞. According to the Kirchhoff's law, we have the following formulation: where U i and U j are the voltages at nodes i and j respectively, and C i is the total current flowing through node i. To better approximate real situations, alternating current is taken into consideration. To this end, we have whereŪ = 1 is the peak of voltage, ω = 10 3 is the frequency, and ∆ω i ∈ [0, 20] is the perturbation. The RN model can be written as the form of (3), with where V ij = U i −U j , and if i and j are disconnected, 1/R ij = 0; otherwise, 1/R ij = 1. It can be assumed that only the voltages and currents can be obtained and the resistor network is to be reconstructed. The decomposition strategy for learning large-scale NRPs The decomposition strategy for learning large-scale NRPs  The numerical simulation process of RN can be described as follows. Each node has an initial random number ∆ω i ∈ [0, 20] in the beginning. At time t, the voltages can be calculated using (12) and the currents are obtained via (11). They are both recorded as time series in a period.

A. Framework of SLEMO-NR
SLEMO-NR is an EA with network structure subspace learning and PLSO to solve large-scale NRPs, which is based on the framework of NSGA-II [39]. The whole NRP is first decomposed into N subproblems, and then they are optimized separately using SLEMO-NR. For each subproblem, an initial population P 0 with size pop is first generated. In the following main loop, the binary tournament selection is utilized to select pop individuals according to the non-dominated front number and crowding distance of each individual. Unlike NSGA-II, crossover and mutation operators are performed to generate offspring in a subspace using the selected individuals. Combining the offspring with P t , population P t com is obtained to undergo preference-based local search focusing on k-sparse solutions, and the generated solutions are also added to P t com . Finally, after deleting duplicated solutions in P t com , the elitist strategy makes pop individuals survive to the next generation. In short, SLEMO-NR is capable of searching for solutions in the subspace by LPCA and focusing on the k-sparse solution with the PLSO compared to NSGA-II. The general framework of SLEMO-NR is presented in Fig. 1 and Algorithm 1. In the following subsections, the main steps in SLEMO-NR are presented in detail. Obtain the ith slice of Y net and A net as Y i and A i ; 3: Set t ← 0 and generate the initial population P 0 ; 4: Dis ← CrowdingDistance(F 1 , F 2 , . . .); 6: while termination criteria are not satisfied do 7: P t parents ← Selection(P t , pop): Perform the binary tournament selection operator to choose pop parents; 8: P t com ← P t ∪ OffspringGenerating(P t parents );// Algorithm 2 9: P t total ← P t com ∪ PLSO(P t com );//Algorithm 3 // Update population 10: Delete duplicated solutions in P t total ; 11: 16: t ← t + 1;

B. Alternative Representations of Binary-Coded Individuals
LPCA is an extension of principal component analysis (PCA) [40] that captures the features underlying the Bernoulli distributed data. A projection of the data with minimum reconstruction error, which is the purpose of the ordinary PCA , can be viewed alternatively as a projection of the natural parameters of a saturated model with minimum deviance. In our algorithm, the decision variable X i is binary-coded representing neighboring connections of a node.Assume that x ij obeys the Bernoulli distribution [33] with the probability of p ij and its corresponding natural parameter θ ij is We have the reverse formulation: The natural parameters of the saturated model are denoted asθ ij . The saturated model, which is the best possible fit to the data, occurs on the condition that p ij = x ij [33], which also means:θ For convenience, the binary variable is converted from {0, 1} to {−1, 1} through a linear projection q ij = 2x ij -1. Numerically,θ ij can be approximated by m · q ij when m is large enough. In this paper, m is set to 10. Under this circumstance, the probability p ij is 1.0 for x ij = 1 and 4.5 × 10 −5 for x ij = 0 according to (17).
With the above definitions,θ ij and p ij can thereby be regarded as alternative representations of binary variable x ij in the evolutionary process. Meanwhile, we can also decode x ij from p ij according to the Bernoulli distribution when the alternative representations are used [41].

C. Network Structure Subspace Learning and Offspring Generating
In recent years, LPCA has been proposed in the literature to extract features from binary data [33]. Several successful applications for solving challenging problems appeared in many fields, such as classification of RNA data [42], clustering and representation of multivariate data [43], and recommendation systems [44].
For large-scale NRPs, we aim at finding the structure of the network. However, directly searching for the network structure is extremely hard and time-consuming. In this paper, a subspace containing the features of network structure is learned via LPCA, where the search for network structure is much more efficient. Different from the Pearson's initial formulation of PCA, LPCA seeks to find a rank-d (d < (N −1)) projection of the natural parameters from the Bernoulli saturated model and minimize the Bernoulli deviance so that the natural parameters can be projected into a subspace and recovered with little deviance. Suppose there be n data used to learn the subspace, each of which is a (N − 1) dimensional vector. We denote the alternative representation of the ith vector asθ i , then its projection in the subspace is and the subspace solutionθ is can be recovered into the original space:θ where µ ∈ R N is the sample mean and W ∈ R N ×d is the principal component loadings matrix. Therefore, the natural parameters are estimated bŷ It also has the matrix notation: whereΘ ∈ R n×N , and its ith row isθ T i . To solve µ and W , the Bernoulli deviance is minimized: is the trace inner product and I d is the identity matrix.
To effectively represent individuals in the subspace, selecting the appropriate subspace dimension d is of vital importance. The parameter d is selected such that a chosen proportion η, which is set to 95% in this paper, is met or exceeded. LetŴ d denote the rank-d estimation of the principal component loadings. With the input data X, the corresponding rank-d estimationΘ d , and sample meanμ, the parameter d is selected as the smallest integer satisfying This criterion can be explained similarly as in standard PCA that at least 100η% of the deviance is explained. In this way, little information is lost in the projection between the original space and the subspace. Note that with (N − 1) principal component loadings, we haveŴ T N −1Ŵ N −1 = I N −1 and D X;Θ N −1 = 0. Therefore, 100% of the deviance is explained by (N − 1) components as expected.
Majorization-minimization (MM) [45] is an approach to minimizing (23) iteratively. In the beginning, set t = 0. Let Θ (t) c :=Θ − 1 n µ (t) T be the column-centered saturated parameters. As recommended in [46], µ (0) is initialized to the column means ofΘ, W (0) is initialized to the first d right singular vectors ofΘ and update W (t+1) with the first k eigenvectors in Q. Afterward, the subspace containing network structure features is learned. By means of the principal component loadings matrix W and sample mean µ, the natural parameters of each solution can be mapped between the original space and the subspace. In the evolutionary process, the offspring are generated in the learned network structure subspace first, and mapped back into the original space to be evaluated.
The non-dominated solutions, which are superior to other solutions, are considered to contain more features of the network structure. They are thereby adopted as an approximation of network structure for learning the subspace. Thereafter, each time two randomly selected individuals from the parent population are projected into the subspace by (17), where two offspring solutions are generated via the single-point crossover and bitwise mutation genetic operators. The features of network structure learned in the subspace are transferred into the original space through recovering offspring solutions with (20). Note that the natural parameters, as the alternative representation of binary vectors, are used in the genetic operator. The overall procedure of subspace learning and offspring generating is presented in Algorithm 2.

Algorithm 2 OffspringGenerating
Input: P parents (selected parent population); Output: O (generated offspring population); 1 [x o1 , x o2 ] ← Obtain the natural parameters of x p and x q , recover them by (20), calculate their Bernoulli probability, and decode to binary individuals; 10:

D. Preference-Based Local Search Operator
In order to enhance the search ability, some approaches employ a local search operator, beyond crossover and mutation, to get the individuals further perturbed [24], [47]. The iterative soft-thresholding (IST), as a representative local search operator, can effectively search for new solutions along the gradient descent direction. Nevertheless, the IST is performed on realcoded variables, and therefore cannot be used directly in our approach. Benefitting from the alternative representations, we propose a PLSO combining the IST and the preference-based strategy to make our approach concentrate more on finding the k-sparse solutions. In the PLSO, the IST is first performed on a selected solution to search for a new solution. Then the preference-based strategy is performed to generate a group of solutions near the new solution. The details of the PLSO is described as follows.
1) Iterative soft-thresholding search: The iterative softthresholding search [48] is designed to solve the relaxed L 1 regularization problem, which has the following formulation: There are three major steps in IST: gradient descent, parameter setting, and soft-thresholding truncation. However, it is impossible to perform gradient descent since the individuals are binary-coded. Thus, the alternative representation takes the place of original individuals to perform the preference-based local search. According to the definitions in Section III.B, we have where x ij and p ij (j = 1, 2, ..., N − 1) are the jth element of X i and P i , respectively. In IST, a new solution P r+1 i is obtained from the previous solution P r i . Let f (X i ) be the first term of (29), which becomes f (P r i ) after the replacement of variables. In the following, the major steps of IST are introduced in brief. a) Gradient Descent: For the current solution P r i (corresponding to X r i ), its trial solutionP i is generated bỹ where ∇f (P r i ) is the gradient of f (P r i ), " " is the Hadamard product, and Ω i is a vector containing the partial derivative of x r ij to p r ij . b) Parameter Setting: The parameter γ r is calculated by optimizing the Barizilai-Borwein equation [49]: where s r = 1 2m ln where As suggested in [50], λ st is generated by a decreasing sequence of {λ r , r = 0, 1, . . . , n}, where λ 0 > λ 1 > . . . > λ n , λ 0 = 0.1 A T i Y i ∞ , and λ n → 0. 2) Preference-Based Local Search Strategy: MOEAs aim to find an evenly distributed Pareto front considering the balance of exploration and exploitation in the evolutionary process. Whereas in practice, approximating the region far from the k-sparse solution is not helpful to infer the network structure. To amend this drawback, the preference-based strategy is employed to focus on a local part of the weakly Pareto front containing the optimal k-sparse solution. The searching efficiency is thus improved a great deal.
In PLSO, there are two parts of information to be maintained.
a) The archived population E, which contains part of the solutions in the population. It is divided into two subsets with a cut-off: where f min 1 is the minimal value of the first objective function found by now in E, and ε is a parameter with a positive value. b) The H-neighborhood, which describes a small region near a solution X i with sparsity level , is considered to ensure the effectiveness of the generated solutions. Here H is the neighborhood size. At first, the archived population E needs to be determined. Since the non-dominated solutions in P t com are more approximate to optimal solutions, they are adopted to form E and generate new solutions through preference-based local search. Then, the subset E 1 is determined according to (36). The parameter ε, usually with a rather small positive value, is adjusted to control the quality of solutions selected for the following soft-thresholding procedure. The value of objective functions varies from problem to problem, so ε is set to the percentage of (f max and E 2 is set in the similar way. The first objective function f 1 reveals the deviance of solutions from true network structure to some extent, which means it should be optimized with priority.
With respect to f 1 , solutions in E 1 are superior to those in E 2 . In this way, a starting solution X r i is randomly selected from E 1 . An H-neighborhood of X r i is thereafter determined with k l = X r i 0 − H and k u = X r i 0 + H as the lower and upper bounds, respectively.
After the above-mentioned preparations, IST with multilevel truncation is implemented to generate new solutions, which is illustrated in Fig. 2. To choose the adaptive step size for the gradient descent operation, a supportive solution is selected from P t com . As proved by Li et al. [47], if X r i is chosen from non-dominated solutions and X r−1 i is chosen from dominated solutions, there is a higher probability to generate improved non-dominated solutions via IST. Therefore, X r−1 i is deliberately selected from solutions dominated by X r i . Note that X r i and X r−1 i are binary-coded. They need to be represented by their probability distribution before performing gradient descent. After that, a new solution P r+1 i is generated by IST.
Following the IST search procedure, which only generates one solution at a time, the main purpose of multi-level truncation is to obtain a group of solutions in the H-neighborhood of X r i . By truncating the new solution P r+1 i at different sparsity levels, no more than 2×H solutions with sparsity in the range of [k l , k u ] are obtained and added toẼ to avoid too much extra computational cost.
At last, all the generated solutions inẼ are used to update the archived population E, which is illustrated in Fig. 3. It  should be noted that the subsets E 1 and E 2 are updated separately. The newly generated solutions are first added to E, and the inferior solutions are removed according to some principles. Then separate E into E 1 and E 2 . Moreover, as the k-sparse solution is more likely to be located in the center of the H-neighborhood, the solutions in E with the smallest and largest sparsity levels are not preferred. The details of the PLSO is described in Algorithm 3.

IV. EXPERIMENTS
In this section, a series of experiments are conducted to empirically validate the performance of our algorithm. Section IV.A gives the experimental settings. We have designed 16 problems based on four kinds of benchmark networks, and the configurations are shown in Section IV.B. In Section IV.C, the performance of our LPCA-based network structure subspace learning approach is verified. Then, Section IV.D demonstrates the effectiveness of the PLSO. The analysis about the effect of two major parameters in our algorithm is further conducted in Section IV.E. After that, we compare SLEMO-NR with six state-of-the-art methods on six real-world networks in Section IV.F. Finally, the computational efficiency analysis is given in Section IV.G.

A. Experimental Setup
To quantify the performance of network reconstruction methods, two common indices are adopted, namely the area Algorithm 3 Preference-BasedLocalSearchOperator (PLSO) Input: P t com (combined population); Output: P t local (generated population); 1: E ← Non-dominated solutions in P t com ; 2: Determine E 1 according to (36); 3: X r i ← Randomly select a starting solution from E 1 ; 4: Determine sparsity levels in an H-neighborhood: k l = X r i 0 − H, k u = X r i 0 + H; // Iterative soft-thresholding search 5: X r−1 i ← Randomly select a supporting solution dominated by X r i from P t com ; 6:P i ← Obtain the alternative representations of X r i and X r−1 i , and generate a trial solution using (31); Set the smallest nonzero element of z as zero, andẼ ← E ∪ z; 12: end while 13: Decode the solutions inẼ into binary-coded individuals; // Update archived population E 14: for each z ∈Ẽ do 15: if f 1 (z) < f min 1 + ε then 16: If ∃z ∈ E with the same sparsity as z , then set z as the one with smaller f 1 ; otherwise, E ← E ∪ z; under the receiver operating characteristic curve (AUC) [51] and the Matthews correlation coefficient (MCC) [52]. Both of them are capable of producing a truthful and informative score in evaluating the performance.
In the experiments, an initialization operator based on LASSO is employed to generate the initial population. In [23], the initialization operator was proved to be capable of accelerating the convergence speed of EA. It should be noted that any state-of-the-art L 1 -minimization methods can be adopted, such as the orthogonal matching pursuit (OMP) [53] and fast iterative shrinkage thresholding algorithm (FISTA) [54]. For fair comparisons, this initialization operation is implemented in all the EA-based algorithms to guarantee an equal and better initial state. For the L 1 -minimization problems, the choice of regularization parameter λ is a critical problem. With an illchosen parameter, the results can be far away from the true network structure. Whereas in practice, we can still generate a group of solutions via LASSO with different λ values. For each individual in the initial population, we uniformly sample a value between 0 and 1 as λ. To maintain the diversity of the initial population and accelerate the speed, only a few iterations, e.g. 20 iterations used in this paper, is needed for LASSO initialization. After that, a cut-off value of 0.5 is used to transform the real-coded results of LASSO into binarycoded individuals.
Besides, the proposed SLEMO-NR is a MOEA, and it provides a group of candidate solutions. The first objective is the reconstruction error and the second objective is the sparsity level. In the optimization process, the real network structure is unknown, and we cannot compare two solutions directly. The reconstruction error can show how close a solution is from the optimal solution to some extent, so it can reflect the superiority of a solution over other solutions. Considering the first objective function is a simple and effective way when selecting solutions. Therefore, in this paper, we select the individual in P f inal with the smallest value of the first objective function in each subproblem to compose the final reconstructed network.
The parameters in SLEMO-NR are set as follows. For all the experiments, the population size is set to 100 and the number of function evaluations is set to 20000. With respect to the single-point crossover and bitwise mutation, the crossover and mutation probabilities are set to 1 and 1/(N −1), respectively, where (N − 1) is the dimension of decision variables. The major parameters in the PLSO are H and ε, both of which are set to 1.

B. Test Problems
Based on four kinds of benchmark networks, namely Erdős-Rényi random networks (ER) [55], Barabási-Albert scale-free networks (BA) [56], Newman-Watts small-world networks (NW) [57], and Watts-Strogatz small-world networks (WS) [58], we have designed 16 problems varying in network models, network types, the number of nodes N and average degree of nodes D . The detailed configurations of the 16 designed are presented in Table I For each test problem, 30 independent runs are conducted for each MOEA to obtain statistical results. Moreover, to maintain the reliability of observed results, the Wilcoxon ranksum test with a significance level of 0.05 is utilized, where '+', '−', and '≈' indicate that the result of another approach is significantly better, significantly worse, and statistically similar to that of SLEMO-NR, respectively.

C. Effectiveness of LPCA in Offspring Generating
To validate the effectiveness of LPCA, SLEMO-NR is compared with its three variants. We denote SLEMO-NR without LPCA, SLEMO-NR with RBM [59], and SLEMO-NR with ordinary PCA as SLEMO-NR I , SLEMO-NR II , SLEMO-NR III , respectively. Particularly, SLEMO-NR III utilizes realcoded variables during the whole evolutionary process to maintain the effectiveness of PCA. Without LPCA, the searching process is conducted in the original space, which indicates that the offspring are generated in the same way as that in NSGA-II. Moreover, although many machine-learning based methods are capable of learning a subspace, most of them can not deal with binary variables and are not recoverable apart from RBM. They are thereby unable to be used in the genetic operator. Figs. 4 and S1 (in Supplementary Materials) illustrate the comparisons of SLEMO-NR with its three variants in terms of the average values of AUC and MCC on Problems 1-16 when N M = 0.3 and 0.5. The detailed numerical results are shown in Supplementary Materials S.I. As can be seen from the figures, the original SLEMO-NR is significantly better or statistically similar in terms of AUC and MCC in 29 out of 32 cases and loses once to SLEMO-NR I . This is possibly attributed to the network structure subspace learning and the consideration of information loss in the projection process. With LPCA in our SLEMO-NR, searching for optimal solutions is conducted in a subspace containing the features of network structure, thus SLEMO-NR is more effective than SLEMO-NR I . Furthermore, the selection of the subspace dimension for LPCA guarantees that little information is lost in the projection between the original space and the subspace. Whereas in SLEMO-NR II , the subspace dimension is chosen according to an adaptation strategy in [60], which has been proved to be more effective than setting to a constant value. As shown in the figures, SLEMO-NR I outperforms or matches SLEMO-NR II in 26 out of 32 cases in terms of AUC and in 24 out of 32 cases in terms of MCC, indicating that RBM with the adaptation strategy of subspace dimension may not be able to effectively capture the feature of network structure and may even harm the performance in some cases. Besides, SLEMO-NR matches or exceeds SLEMO-NR III in 30 out 32 cases, which not only shows the effectiveness of LPCA, but also demonstrates the advantage of using binary-coded variables.

D. Effectiveness of Preference-Based Local Search Operator
In this section, we demonstrate the effectiveness of the PLSO on SLEMO-NR. Experiments are conducted on Prob- There are a few cases that this operator has a side effect on SLEMO-NR. This is possibly due to the fact that the PLSO leads the population to a local optimal area since it is based on a gradient descent approach and the NRP is non-convex.
Compared with SLEMO-NR, SPLS fails to get satisfying performance. Especially in some cases when SLEMO-NR has already perfectly reconstructed the network, the result of SPLS is still far from the true network. Although both of the two methods utilize IST and the preference-based strategy, there are two major differences. First, SLEMO-NR selects a supportive solution to get adaptive step size for IST, while in SPLS the step size is constant. Second, the search area of SPLS is restricted to a small region near the solutions generated by IST. While Algorithm 3 acts as a local search operator and is performed on the current population generated by genetic operators in each generation, which combines the search ability of both genetic operators and IST, and improves the capability of searching for optimal solutions compared with SPLS.

E. Analysis of the Parameter Sensitivity
In this subsection, the effect of two parameters on the performance of SLEMO-NR is investigated. The search ability of PLSO is affected by the two parameters. The first parameter ε, which determines the cut-off between two subsets E 1 and E 2 , affects the selection of solutions to perform IST. It should be noted that when ε = 0, the selection of solution to perform IST is restricted to the one with the smallest value of the first objective function. The second parameter is the neighborhood size H in the PLSO, which controls the breadth of the search. With a larger value of H, the generated solutions through the multi-level truncation may be farther away from the originally selected solution. If H is set to 0, the PLSO becomes invalid.
To analyze the sensitivity of parameters, Problems 3,8,9,and 14 in Table I are selected as representative problems. For any two of them, there are at least two differences in the configurations. The experimental results are summarized as follows.
1) Fig. 6 shows the performance of our algorithm on the four problems in terms of MCC under different N M , where ε takes the value from {0, 1, 5, 10, 20}, and H is set to 1. From the figures, we can observe that SLEMO-NR has the optimal or at least the suboptimal performance when ε = 1. When ε = 5 or 10, we may obtain a good performance occasionally, but the overall performance is not good enough. 2) Fig. 7 shows the performance of our algorithm on the four problems in terms of MCC under different N M , where H takes the value from {0, 1, 5, 10, 20} and ε is set to 1. As can be seen from the figures, the setting of H has an impact on the performance, especially when the length of data is not large enough. With H setting to 1, our algorithm achieves the best performance in all but one case. Moreover, when H is set to a relatively small value, we can obtain better performance compared to H = 0 (without the PLSO) in most cases.
According to the experiments conducted in this subsection, these parameters influence the performance of SLEMO-NR to some extent. When the amount of data is sufficient, such as N M 0.5 in Problems 3, 8, and 9 and N M 0.6 in Problem 14, SLEMO-NR can always fully reconstruct the networks whatever the choice of these parameters. Whereas in the other cases when the amount of data is limited, properly selected parameters enhance the overall performance significantly. With both the parameters setting to small values, the PLSO is mainly conducted near the solutions with smaller values of the first objective function. This helps perform deeper search and find non-dominated solutions. Note that the characteristics of different NRPs may vary greatly, so the proper combinations of parameters in one NRP maybe not suitable for the others and need to be adjusted accordingly. F. Application to Real-world Networks In this subsection, we validate the performance of SLEMO-NR on 6 real-world networks. These networks include polbooks, USAir97, 130bit, bibd-12-5, TSOPF RS b162 c1 (take TSOPF for short) [61], and email-Eu-core [62]. The number of nodes N and the number of links L of these networks adopted in this paper are presented in Table II. The dataset used to reconstruct the networks is generated using EG and RN models described in Section II, and N M is set to {0.2, 0.3, 0.4, 0.5} for each network. The performance of SLEMO-NR is compared with 6 state-of-the-art methods. As for the ML-based approaches, we select CS, LASSO, and OMP as a contrast. Furthermore, three EA-based algorithms are employed in the experiments as comparisons. MAST-Net [24] is a memetic algorithm specially designed for NRPs, which has shown superiority on small-scale networks. SparseEA [63] and MOEA/PSL [60] have been validated to be effective on large-scale MOPs, and are implemented on the evolutionary multiobjective optimization platform PlatEMO [64]. For the comparative methods, we use identical parameters suggested in the original papers and codes. For all EA-based methods, the number of function evaluations is set to 100 × N for fair comparisons. All experimental results in terms of AUC and MCC are obtained by averaging over 30 runs on each dataset. The Wilcoxon rank-sum test with a significance level of 0.05 is utilized, where '+', '−', and '≈' indicate that the result of another approach is significantly better, significantly worse, and statistically similar to that of SLEMO-NR, respectively.
The results of application to the EG network reconstruction are shown in Tables III and SV ( demonstrate the significant performance of SLEMO-NR. Besides, we use four exemplary cases, which are subproblems selected from polbooks and 130bit datasets based on EG model with N M = 0.2 and N M = 0.4 respectively, to demonstrate our algorithm's capacity to reconstruct networks. For a certain subproblem, we plot all the non-dominated solutions obtained by MOEAs and the best solutions obtained by other methods. The results are shown in Fig. 8. As illustrated in these figures, it can be clearly observed that SLEMO-NR concentrates more on the regions near the optimal solutions and can obtain solutions with the smallest reconstruction error in most cases. For the compared methods, they are likely to get stuck in small regions far from the optimal solutions and thus getting unsatisfying performances.

G. Computational Efficiency Analysis
Lastly, the computational complexity of the proposed algorithm and its runtime comparisons with other algorithms are analyzed. In the initialization process, we use LASSO in this paper and its time complexity The population update process is the same as that of NSGA-II, which has a complexity of O(pop 2 ) [39]. Furthermore, all the above steps are the optimization process of a subproblem, and the whole NRP is consisted of N subproblems. In summary, the overall computation complexity of SLEMO-NR in one generation is O(N 4 + popN 2 d + HN pop 2 ).
Besides, the runtimes (in seconds) of all the compared algorithms are shown in Table V, where all the experiments are conducted on the six real-world datasets based on RN model with N M = 0.4. Compared with ML-based algorihms, EAbased algorithms take longer runtime because of maintaining a group of solutions, but they can obtain better results in most cases. In order to fairly compare the ML-based and EA-based algorithms, we increase the maximum number of iterations of the ML-based algorithms to make them consume the same computation cost as EAs, but their performance fail to get improved. MAST-Net is a single objective EA, so it takes a little shorter time to run compared with MOEAs. Since SLEMO-NR needs to learn network structure subspaces and perform preference-based local search, its runtime is slightly longer than that of the two other MOEAs in three out of six cases. In short, the proposed SLEMO-NR has competitive efficiency compared to the other algorithms.
V. CONCLUSION Due to the difficulty in setting an appropriate cut-off value and the curse of dimensionality in solving large-scale network reconstruction problems, the performances of existing methods are far from satisfactory. To overcome these shortcomings, we propose a subspace learning based evolutionary multiobjective network reconstruction algorithm, termed as SLEMO-NR. The  core of SLEMO-NR is to use alternative representations in place of binary-coded individuals in the evolutionary process. The alternative representations not only carry the information of individuals but also can perform local search to find better solutions. Furthermore, we use LPCA to learn a subspace containing all the features of the network structure. Searching for optimal solutions in the subspace is much more effective than in the original space. The offspring solutions are generated in the subspace and mapped back to the original space. Afterward, a preference-based local search operator is proposed to concentrate on finding k-sparse solutions. The experimental results on synthetic networks show that the network structure subspace learning and preference-based strategy can significantly improve the performance of SLEMO-NR. Extensive experiments on six real-world networks validate the superiority of SLEMO-NR over six methods in solving largescale network reconstruction problems.
This paper shows the necessity of network structure sub-space learning in solving large-scale network reconstruction problems. Although the proposed method has a significant performance by using LPCA, it may become impractical when dealing with networks with millions of nodes. Therefore, further investigation of this problem is still desirable. Moreover, we are trying to find broad applications of SLEMO-NR in more challenging and practical problems.