Improved Leach algorithm for detecting opinion leaders in social networks

Today, the development of human communication through social networks has increased the importance of social network analysis. In social networks, the role of nodes is not the same and important nodes play a key role in network performance. The nodes which have more influence in a social network are called "opinion leaders". Measuring the importance of each node is needed in order to finding the opinion leaders of social networks. This parameter is measured using the centrality criteria in the network’s graph. In this paper, the well-known Leach algorithm that used to find the cluster heads in wireless sensor network has been modified to determine opinion leaders in social networks. The evaluation of the proposed algorithm is performed using centrality (degree, closeness, betweenness and eigenvector) indicators. Simulation results show that the proposed algorithm has a significant relationship with the standard centrality indices with a correlation value greater than 0.7.


1-1 Leach algorithm
Leach is a cluster-based routing protocol in wireless sensor networks [5]. The purpose of this protocol is to reduce the energy consumption of the nodes in order to improve the life time of the wireless sensor network. In standard Leach algorithm, the nodes are organized as clusters and in each cluster one node is selected as the cluster head [6]. In each round of this algorithm, p% of the nodes determined as the cluster heads. Each node is assigned a random value between 0 and 1. If this number is less than the threshold specified in Eq. 1, during current round the node will be determined as the cluster head, otherwise it will continue to operate as a normal node [7].
In Eq. (1), p is the desired percentage of cluster heads for each round that has been predetermined, r represents the current period number, and G is a set of nodes that were not cluster head in the previous ( 1 ) step. This equation is designed so that at each ( 1 ) round, all of nodes are become cluster head only once, so the energy consumption is distributed throughout the network. Various versions of the Leach algorithm such as E-LEACH, TL-LEACH, M-LEACH, LEACH-C and V-LEACH have been proposed by researchers that more details in this regard can be studied in [8][9][10][11][12].

1-2 Centrality criteria
In the 1950s, attempts were made to define the graph centrality criteria. Different centrality criteria have been introduced to measure the influence of acters in the social network. The most common centrality that have been introduced are :1) degree centrality 2) closeness centrality 3) betweenness centrality and 4) eigenvector centrality. These centrality criteria are briefly discribed as follows: 1) Degree Centrality of each vertex is defined by the number of edges adjacent to that vertex.
where A represents the graph adjacent matrix.
2) Closeness Centrality measures the distance of one vertex from all other vertices in the network's graph. This centrality is measured by the length of paths or steps required by one person to reach other people on a social network.
where ( , ) represents the distance between vertices v and t.
3) Betweenness centrality of vertex v is the number of times the other vertices of the graph pass through this vertex to reach each other. The betweenness centrality of vertex v is calculated as follows: where represents the total number of shortest paths from node s to node t and ( ) indicates the number of paths that passes through v.

4)
Eigenvector Centrality (also called eigen centrality or prestige score) is a measure of the influence of a node in a network. The eigenvector centrality score of node i, denoted by , is calculated as follows: where ( ) indicates the set of neighbors of nodes i and λ is a constant called the eigenvalue.
In this paper, using the concept of centrality and applying the improved Leach algorithm, opinion leaders of social networks are determined. By modifying the standard Leach algorithm and evaluating the results using the centrality criteria, a new criterion for determining centrality and finding opinion leaders has been proposed.

2-Related Works
Social networks analysis is the one of most important field of sociology, economics and computer science. Okamoto et al. [13], considered all the characteristics provided for opinion leaders, influential people, market experts and key acters, and presented a complete classification including structural, relational and individual characteristics for opinion leaders. Numerous researches have examined the types of centrality criteria which can be studied in [14]. In [14], using the concept of centrality (degree centrality and closeness centrality), the performance of social networks has been analyzed [15,16].
In [17], using graph mining algorithms, a solution for clustering social networks (based on the centrality of nodes) is proposed. In paper presented by Jensen et al., by calculating the centrality metrics for network nodes, important nodes are identified and placed in the center of the cluster. Nettleton et al.
[18], using graph mining algorithms, offer a solution to improve social network clustering (based on edge betweenness algorithm). In their research, a big dataset is generated by modeling a social network using huge graphs (where nodes are the same as individuals, organizations, and groups), and big data analysis algorithms have been used to find opinion leaders of social network.
In paper introduced by Abdulsalam et al. [19], a distributed data aggregation method with a clustering structure was presented. In their method, the amount of energy consumption in the data aggregation step was compared and the step of cluster construction is considered for both algorithms (i.e. the proposed CDDA and the same Leach algorithm). This method has an effective improvement in energy consumption in environments with high data correlation compared to the Leach algorithm, while in environments with little correlation between data, the two algorithms work almost identically. In [20], a clustering method to reduce energy consumption in wireless sensor networks is presented, which is the most effective method for scalability and reducing energy consumption in wireless sensor networks. The simulation results show that algorithm presented by Sasirekha et al. can reduce the energy consumption of the wireless sensor network and significantly increase the life time of these networks. In [21], a hybrid multi-step routing algorithm is proposed, which combines hierarchical and planar multi-path routing algorithms. Their primary goal is to minimize energy consumption in the wireless sensor networks. Approach to improve the network clustering for efficient use of energy in wireless sensor networks is presented by Baghouri et al. [22].

3-Proposed Method
The main idea of the proposed algorithm is to make changes in the standard Leach algorithm These changes have been made with the aim of determining cluster head as the opinion leader of social network. Based on the standard Leach algorithm, a random number (between 0 and 1) is assigned to each node. If this number is less than a certain threshold, the node is selected as the cluster head.
To analyze this method, the types of centrality criteria introduced in the previous sections were calculated. These results showed that the standard Leach algorithm has very little correlation with the calculated centralities. Therefore, changes were made in standard Leach algorithm to improve these results (in terms of the degree of correlation). In order to achieve better load balancing, proper distribution of cluster heads and reduce the random effect of the standard Leach algorithm, the parameter of physical distance of nodes was added to the Leach algorithm. These changes are leads to meaningful relationship and correlation between the selected cluster heads of the modified Leach algorithm and the standard centralities. The pseudo-code of the proposed algorithm is as follows: In original version of Leach protocol, all nodes with a fixed probability are selected as cluster heads, so some unsuitable nodes may be selected as cluster head. Therefore, some choices may cause rapid energy consumption in important nodes (which plays main role in the network because these nodes connect two subnets of social network graph) and this problem causes the network to be partitioned. Some changes have been made to improves standard Leach algorithm. These changes are expressed in Eq. (6) -(9).
= − , ∈ (9) Using the distance criterion, the location of each node was determined, so it is expected that the possibility of selecting the cluster head will no longer be completely random. In addition, no changes have been made to the time complexity of the modified algorithm compared to the original Leach algorithm, because no new loops or similar commands have been added to the proposed algorithm in the modified version. The correlation coefficient can also be used to test the revised version.

4-Experimental results
After performing the simulations using the values and methods described, different results were obtained which are presented in this section. Correlation calculations are one of the most basic methods of "statistical analysis". Its purpose is to measure the type of relationship and the degree of similarity between different attributes of objects and phenomena that are being studied. "Correlation coefficient" is a statistical index that determines the different degrees of relationship between two dependent variables on a fixed and finite scale.

4-1 Correlation of standard datasets
In this section, the data correlation was calculated on 6 standard datasets. These datasets include social networks with a size of 34 to 1490 nodes. Table.1 shows the degree of correlation between the centrality criteria of standard datasets.

4-2 Solidarity on artificial social networks
In Table. 2, the analysis results of the correlation degree between the proposed algorithm and standard centralities was presented. Datasets are generated randomly and the number of nodes is between 10 to 80 nodes. Similarity of results with standard centralities were computed. Finally, correlation of 0.7 is reached. The proposed algorithm with the standard centrality criteria has a significant logical relationship and the correlation of their output values was more than 0.7.
Centrality criteria are calculated using NodeXL network analysis software. Then the correlation of the output of the proposed algorithm with the outputs of the NodeKL software is compared. 10 to 80 nodes have been compared. Table. 2 shows the degree of correlation between different centrality criteria compared to each other and compared to the modified Leach algorithm. It can be seen in Table. 2, the Eigenvector-Closeness criteria have the highest level of correlation and the Eigenvector-Leach criteria have the lowest level of correlation.

5-Conclusion
Conventionally, social networks can be represented in the form of standard graphs, in which the nodes are equivalent to the acters of social networks, and the edges of the graph indicate the relationship between the actors. Research shows that the status of nodes in networks is not the same and there is a big difference between their importance. At the same time, nodes in different situations have different levels of influence on network survival. Therefore, evaluating the importance of nodes in complex networks is not only useful but also critical. In fact, the main challenge for social network analysis is to identify key people in social network. In most studies, these people are called "opinion leaders" and in other studies. In this paper, using the concept of centrality (degree, closeness, betweenness and eigenvector) and modifying the standard Leach algorithm, opinion leader of a social network has been determined. The main idea is applying the well-known Leach algorithm in the field of social networks. The main functionality of this algorithm in selecting the cluster head of wireless sensor network has been modified in order to find the opinion leaders in the social networks. The simulation results show that the proposed algorithm has a significant logical relationship with the standard centrality indicators and the correlation of their output values is more than 0.7.