Multi-objective PSO with Pareto Neighborhood topology for Clustering

In this paper a new technique is integrated to Multi-Objective Particle Swarm Optimization (MOPSO) algorithm, named Pareto Neighborhood (PN) topology, to produce MOPSO-PN algorithm. This technique involves iteratively selecting a set of best solutions from the Pareto-Optimal-Fronts and trying to explore them in order to find better clustering results in the next iteration. MOPSO-PN was then used as a MultiObjective Clustering Optimization (MOCO) Algorithm, it was tested on various datasets (real-life and artificial datasets). Two scenarios have been used to test the performances of MOPSO-PN for clustering: In the first scenario MOPSO-PN utilizes, as objective functions, two clusters validity index (Silhouette-Index and overall-cluster-deviation), three datasets for test, four algorithms for comparison and the average Minkowski Score as metric for evaluating the final clustering result; In the second scenario MOPSO-PN used, as objectives functions, three clusters validity index (I-index, Con-index and Sym-index), 20 datasets for test, ten algorithms for comparison and the F-Measure as metric for evaluating the final clustering result. In both scenarios, MOPSO-PN provided a competitive clustering results and a correct number of clusters for all datasets.


INTRODUCTION
Clustering, or unsupervised classification, is a major research topic in machine learning, data analysis, data mining and pattern recognition. It is an integral part of a whole process of exploratory data analysis making it possible to produce tools for synthesizing, predicting, visualizing and interpreting a data set to build homogeneous groups (clusters) of individuals, based on their similarities or dissimilarities [17] [18], in the sense that two close individuals must belong to the same group, while two distant individuals must belong to different groups.
The problem of clustering tends to be more difficult than classification since no information is provided on the number of clusters "k", their centers, and configurations. Many multi-objective optimization algorithms have been used in the clustering domain, including Genetic Algorithms [27].Simple Genetic Algorithm SGA is the first mono-objective version of Genetic Algorithms [10]. Multi-objective Genetic Algorithm, MOGA [11], is the multi-objective version of genetic algorithms.
Before continuing with the description of some multi-objective clustering methods, we should note that the majority of those methods deals with the compactness, the total symmetry and the connectedness of clusters. The compactness of the clustering measures how close are the objects that belong to the same cluster, the total symmetry of the clusters evaluates how the clusters candidates are symmetry distributed regarding the center while the connectedness of the clusters measures how separated clusters are connected Multi-objective Particle Swarm Optimization and Simulated Annealing algorithm, MOPSOSA was used in [1]. To product a goods clustering solutions, this algorithm optimizes simultaneously three cluster validity indices: the DB-Index which takes into account the compactness of the clustering and is established using the Euclidean distance, the Sym-Index which is looking for the total symmetry and is centered on the point symmetry distance and finally the Con-Index which examines the connectedness and is established using the short distance. The MOPSOSA was tested on UCI benchmarks datasets including 5 real-life datasets (Iris, Cancer, Newthyroid, LiverDisorder and Glass) and artificial datasets including 14 datasets (Sph-5-2, Sph-4-3, Sph-6-2, Sph-10-2, Sph-9-2, Pat1, Pat2, Long1, Sizes5, Spiral, Square1, Square4, Twenty and Fourty).
Multi-Objective Clustering with automatic number of clusters (k) determination, MOCK [22], applied for web mining, this algorithm optimizes two objective functions the Overall Cluster Deviation and the Connectivity, which are based respectively on the compactness and connectedness of the cluster. To evaluate MOCK, 9 test datasets are employed (Square1, Square4, Sizes5, Triangle1, Long1, Twenty, D2C10, Spiral and LongSquare).
A multi-objective artificial bee optimization algorithm called cOptBees-MO was proposed in [8], in which two scenarios were used. The first scenario tries to optimize two cluster validity indices (Silhouette and Overall Cluster Deviation), while the second attempts to optimize three cluster validity indices: the I-Index, the Con-Index and the Sym-Index. The evaluation of cOptBees-MO is based on 10 real-life datasets: three of them are used in the first scenario (Votes, Zoo and Soybean) and the rest (Iris, Cancer, Newthyroid, Wine, LiverDisorder, LungCancer and Glass) are used in the second scenario. A Hybrid Chain-Hypergraph P System for Multi-objective Ensemble Clustering, HCHPS-MOEC, was proposed in [37]. This system combines the advantages of chain and hypergraph topology to establish three types of sub-systems: the reaction-chain-membrane subsystems, the local-communication-membrane subsystems and global-ensemble-membrane subsystems. The reaction-chain-membrane subsystem is used to implement the multiobjective clustering strategies using three new types of rules: the Nondominated-Object-Selection-Rules, the Crossover-Rules and the Mutation-Rules). The evaluation of HCHPS-MOEC is based on two artificial datasets and 17 UCI-real-life datasets (Iris, Newthyroid, Wine, Diabetes, Bupa, Yeast, Glass, Cancer, Heartstatlog, Balancescale, Seeds, Aggregation, Vowel, WBC, Ecoli, Zoo and Heart).
A Reference Vector-based Multi-Objective Clustering for high-dimensional data, RVMOC, was proposed in [21] to optimize three multiple clustering criteria (the intra cluster dispersion, the inter cluster separation and the negative shannon entropy) using the pareto fronts with a fuzzy estimation of the best returned solution around the knee point with evaluation over the UCI benchmarks datasets [20] [35]. This algorithm uses the cat swarm for searching the optimal clustering solution which is presented by the position of a cat. During the search phase, the cat updates its position through two modes: the seeking-mode and the tracing-mode, using respectively the simulated annealing strategy and the quantum theory. As objective functions, MOCSO use the cohesion and the connectivity. For Evaluation, MOCSO uses three artificial datasets (DS1 [14], DS2 [34] and DS3 [14]), four UCI-real-life datasets (Iris, WDBC, Wine and Bcw) and a field working condition dataset.
A Multi-Objective Quantum Moth Flame Optimization MOQMFO algorithm which was proposed by [33]. MOQMFO combines the features of the Quantum theory and the multi- The MOPSO-PN used clusters validities indices [8] as objectives functions in the process of multi-objective clustering in order to help to find an optimal number of clusters and an optimal clustering solution and F-measure [31] as metric for evaluation of the final obtained clustering solutions. Based on this metric, the performance of MOPSO-PN is then compared to the predefined state of art algorithms. This paper is organized as follows: In the second section a representation of clustering as well as multi-objective clustering (its bases and objective functions) is made; In section 3, the proposed method MOPSO-PN for clustering is detailed; In section 4, experimental investigations of MOPSO-PN on different databases are discussed, and compared to ten similar proposals; and finally in section 5, conclusion and perspectives are elaborated.

MULTI-OBJECTIVE CLUSTERING OPTIMIZATION (MOCO)
Clustering consists in decomposing a dataset (D) into several non-empty homogeneous subsets called clusters {C 1 , C 2 , …, C k }, where k is the number of clusters which may or may not be known a priori (Jain, 2010). Data in each cluster are sharing common features according to similarity criteria (Jain & Dubes, 1988).
A data-element d∈ D is found in one and only one cluster C i .
Let P be a family of sets ( P = {C 1 , C 2 , …, C k } ): P is a partition of D if only if: -The empty set is not in P : ∅ ∉ P; -The union of the sets of P is D; D=C 1 ∪ C 2 ∪ …∪C k ; -The intersection of any two sets of P is the empty set Ø: C i ∩ C j = Ø, i ≠ j, i = 1, 2,…, k, j = 1, 2,…, k.

Definitions
Multi-objective optimization problem [7] [6] [9] [16] aims to optimize several contradictory objectives functions {F1, F2, …}. Given a problem with a couple of objectives functions F1 and F2 subject to minimization; The multi-objective optimization of this problem may be formulated by the following: • Find x belonging to [Lower-Bound; Upper-Bound]: • Satisfying:min (F1(x));min (F2(x))…min (Fn(x)) In MOCO problems, cluster validity indices are generally used as objectives functions. The key concepts of MOCO are detailed in the next paragraphs.

Domination:
In Multi-Objective Optimization (MOO) [12], the domination concept is used to compare solutions of the problem [16]. Given two clustering solutions S1 and S2 in minimization case; S1 is said to be better than S2 if only if S1 dominates S2 and that's performed if only if all objectives functions of S1 have better value than those of S2, it's mean that all objective functions have lower values, or if all objectives functions of S1 are equal to those of S2 but only one objective functions of S1 is lower than that of S2.For Example in minimization case with x objective functions. S1 is said to dominate S2, denoted as S1 < S2, iff: 2. ∃ j ∈{1,...,x} : S1 ( j ) <S2( j )

Pareto Optimal Solutions
In MOO problems (for example in minimization problem with x objective functions) we cannot declare a point as a solution to the problem because if we find a point with the best minimum value for an objective function there is certainly another objective function relating to this point whose value is not minimal. Consequently, the solution to MOO problem may not be a single

Repository
The repository is a data structure containing the Pareto-Optimal-Solutions which are nondominated solutions of the problem as in [6] [16]. It is iteratively updated with new optimal solutions generated at each iteration. The update is conditioned by new repository members shouldn't being dominated by the old members.

Objective functions in MOCO
Objectives functions in MOCO are presented by cluster validity indices. In this paper, MOPSO-PN algorithm used, as objective functions, fives cluster validity indices: Silhouette, Overall Cluster Deviation (Dev(C)), I-Index, Con-Index; and Sym-Index.
Variables used in the descriptions of these cluster validity indices are presented in table1.

Silhouette
The silhouette method tends to measure how an object is belonging to a given cluster; its maximal value is (+1) meaning that the object is confused with its cluster while a minimum of (- The silhouette index is calculated using equation 3.
is the mean distance of an object ⃗ to other objects belonging to the same cluster of ⃗ ; is the minimum of the mean distances of ⃗ to other objects belonging to the same cluster of ⃗.
As illustrated in equation 4, for any cluster its silhouette index is the mean of the silhouette indices of all objects. [

Overall cluster deviation (Dev(C))
The cluster deviation (Dec(C)) as presented in [8] [14] and is the metric evaluating the compactness of a cluster. Dec(C) is the overall summed Euclidean distances between cluster centers and their corresponding data objects and is calculated as follows: where C is the set of clusters, K, and ⃗ present respectively the number of clusters, the number of objects in Cluster (k) and the centroid of Cluster (k); (. , .) presents the Euclidean Distance between the centroid ⃗ and his corresponding object ⃗.
Dev(C) should be minimized since a lower value gives optimal clustering.

I-index
I-index measures the geometric repartition of clusters related to their centers using the Euclidean distance, [8] [31]: where ℇ1, is constant for a given data, the power p is used to control the contrast between the different cluster configurations, In equations (7) and (8) ⃗ is the center of the Cluster (k), ; ⃗ stands for object (j) of Cluster (k).

Con-index
This cluster validity index informs about the connectivity, it is based on the relative neighborhood, which is a graph measuring how separated clusters are connected [8] Here, LMNOP 0 ⃗ , 6 ⃗ 4defines the minimum distance in the neighborhood graphbetween the center ⃗ of cluster (k) and the object ; ⃗ ( ; ⃗ the j-th object of cluster (k)). A minimal value of Con-index means that the cluster is compact and separated from its neighbors.

Sym-index
The Symmetry index, Sym-index, evaluates the symmetry of a cluster in regards to its center.
Where K represents the number of clusters; D Y is the maximum Euclidean distance between two center of clusters (the clusters centers c [ and c f are taken paired); d^g h )x ⃗, c ⃗* is a symmetry measure of x ⃗ with respect to c ⃗ and d i )x ⃗, c ⃗* is the Euclidean distance between x ⃗ and c ⃗.Sym-index is subject of maximization to get the correct number of clusters.

PROPOSED METHODS: MOPSO-PN ALGORITHM
In this section, the MOPSO-PN algorithm is first presented as a heuristic improvement secondly as a multi-objective clustering application.

PSO Algorithm
PSO [32] is a well-regarded optimization algorithm inspired by the social behavior of animals (particles). In PSO, positions of particle Pos i present solutions to the problem. The optimization in this algorithm is made by updating the position of each particle Pos i in the swarm based on velocity Vel i using equation16; velocities are also updated using equation15.
Pos i (iter+1) = Pos i (iter) + Vel i (iter) (16) where Pos i and Vel i present respectively the position and the velocity of i-th particle; t presents the number of iteration; c1 is the cognitive factor, c2 is the social factor; r1, r2 two numbers chosen randomly from [0, 1], pbest i is the local best according to a given topology, gbest is the global best position in the swarm, W stands for the inertia weight.

Multi-Objective PSO
The multi-objective form, MOPSO [7] [16], consists of combining the characteristics of the original algorithm and the multi-objective optimization form, by the application of the concept of "Dominance" to store the best solutions obtained during each generation into a "Repository": the Repository is a set of non-dominated solutions standing for "Pareto-Optimal-Solutions". In [9], a non-dominated sorting policy is used to detect the pareto fronts candidates then a crowding distance sorting is used to reduce the repository size for a better detection of the best solution. The selection of the best solution from the "Pareto-Optimal-Solutions" is made through a technique based on the distance between the set of points of the Pareto and an ideal point called Utopia-Point; the best solution is then defined as the closest point to the Utopia-Point [8].As showed in Figure 1, green circles represent the Pareto-Optimal-Solutions (Repository members), yellow circle presents the utopia point, which is the intersection between the line parallel to F2 and which passes through the solution-point with the minimum value of F1 and the line parallel to F1 and which passes through the solution-point with the minimum value of F2, the best solution which is defined as the closest point to the Utopia-Point is presented by the orange circle.

The Pareto Neighborhood Topology
Assuming a swarm of N1 particles and a repository, including best solutions, of size N2<N1.In classical MOPSO, in each iteration, local solutions in the swarm are handled with a native PSO mechanism, the repository is updated with new optimal solutions from the swarm and the best solution in the repository is assumed to the closest solution to the utopia point. The proposed Pareto Neighborhood (PN) intrinsically changes the local solutions of particles in the swarm by changing a given number N3 of particle solutions, with solutions from the repository. More clearly, the PN consists in selecting a neighborhood of N3 solutions (N3 ≤ N2), from the Pareto-Optimal-Solutions (Repository members), closest to the Utopia point and different from the best Pareto solution and assign them to N3 solutions of particles in the swarm. Figure 2 shows how the PN is managed. N3=5 this means five best Pareto neighborhood solutions (green circles) closest to the Utopia point (yellow circle) and different from the best Pareto solution (orange circle) are selected from N2=11 Pareto-Optimal-Solutions (green circle) and affected to N3=5 particles in the swarm of size (N1=N2*2=22). The PN may be used with any Multi-objective meta-heuristic it allows to limit the side effects of a non-precise utopia point detection. It may be symmetric or non-symmetric, in the symmetric case a similar number of point are selected around the Pareto-best-solution, the first one will tends to better serve the first optimality criteria while the second one will serve better the second optimality criteria. Those neighbors of the Pareto-best-solution are used as local optimums of next MO-Meta heuristic processing. The neighborhood size, N3, should be less than the swarm size, N1, of the used meta-heuristic. Finally the Pareto corresponding particles are inserted as local bests for next iterative processing such in Figure 3.

MOPSO-PN for Clustering
To use MOPSO-PN for clustering, each clustering solution (dataset clusters centers) is defined by the position of a particle Pos i which is presented by a matrix of size k rows and d columns, where k is the number of clusters and d is the dimension (the number of features) of the dataset. Each line vector in this matrix presents a cluster center as follows.
C ij is a vector of size d, it represents the j-th cluster center of the i-th particle. The fitness (F) of each particle is represented by a vector of size n (n is the number of objective functions). In the first scenario, there are two objective functions presented by two cluster validity indices (the Silhouette and the overall cluster deviation) for that n=2 and by consequence F=[F 1 ,F 2 ], while in the second scenario there are three objective functions presented by three cluster validity indices (I-Index, Con-Index and Sym-Index) for that n=3 and by consequence

Architecture of MOPSO-PN for clustering
The architecture of MOPSO-PN algorithm for clustering is represented by Figure 4. As input, MOPSO-PN receives the dataset to be processed, the initialization parameters and the objectives functions (The algorithm receives two objective functions in the first scenario and three objective functions in the second scenario). As output, MOPSO-PN provides the number of clusters related to the dataset supplied), dataset centers and clustered dataset. Second Scenario with three objective functions: After defining the input data (Dataset for clustering "Dataset" , Maximum Number of Iterations "MaxIteration" , Number of Particles "N1", Repository Size "N2"(N2=N1/2) , Neighborhood size "N3" (N3 ≤ N2) for the Pareto Neighborhood and Initial Parameters (w, c1,c2)), steps of MOPSO-PN for clustering are detailed in Figure 5, these steps are defined as following: Step 1 consists in defining the range of values of k: k ∈ { k 1 … k n }; this step is made by: Step 10 consists in selecting the best value ok k with the maximum value of "F-measure" and its corresponding optimal solution to produce as output the best value of k with the best F-measure and the best position of cluster centers (particle's positions) and consequently the dataset clustered according to the clusters centers. GBest  Best-Solution 6.1) Apply the Pareto Neighborhood topology.

6.2)
Update velocity and position of each Particle as in Equations (15) and (16).

6.3)
Calculate the new Cost of each Particle.

6.5)
Determine domination between Particles and fill the Repository with Non-Dominated Particles. 6.6) Determine domination of new Repository members.

6.7)
Keep only Non-Dominated members in the Repository.

6.8)
Check if Repository is full and keep only N2 members If Size (Rep) > N2 //the current size of the Repository exceeds its initial size Define the "Utopia-Point" in the Repository; Rep  N2 solutions selected from Rep as the closest points to "Utopia-Point"; Else // Do Nothing End If 6.9) Select the final "Best-Solution" (Best clustering solution related to this iteration), from the Repository, as the closest point to the "Utopia-Point". 6.10) Calculate the "F-measure" (obtained after the classification of the dataset based on "Best-Solution")  In each iteration this "Best-Solution" and its corresponding "Fmeasure" are stored. iter  iter+1. End While 9) Select the "Best-Solution" with the maximum "F-measure" from the set of stored "Best-Solution" and "F-measure" obtained in the end of all iterations. End For 10) Select the best value ok k with the maximum value of "F-measure" and its corresponding "Best-Solution". Output: number of cluster k + "Best-Solution" having the maximum "F-measure".

EXPERIMENTAL INVESTIGATIONS
This section contains an evaluation of the performances of MOPSO-PN clustering algorithm over 9 real-life datasets and 14 artificial datasets and a comparison with state of art algorithms.

Test Scenarios
To evaluate the performances of MOPSO-PN, two scenarios have been developed and detailed in the next paragraphs.

First test scenario
In this scenario, MOPSO-PN algorithm used two cluster validity indices as objectives functions (the Silhouette and the overall cluster deviation Dev(C)) and three real-life datasets (Votes, Zoo and Soybean) for the test. MOPSO-PN algorithm was executed 30 times with the following input parameters showed in Table 3. and Cunha et al [8].

RESULTS AND DISCUSSIONS
In this section we will present and discuss the results provided by MOPSO-PN.

Results and discussions of the first scenario
The Average Minkowski scores of MOPSO-PN and associated algorithms from the literature over Votes, Zoo and Soybean datasets are showed in Table 4. Bests results (with lower values) are marked in bold. Here, we distinguish two version of cOptBees-MO: cOptBees-MO1 and cOptBees-MO2. Provided results in Table 4 show that, although cOptBees-MO in its second version, which optimizes the silhouette-index and number of clusters, gives the best results for all datasets, it does not gives good results in its first version which optimizes the silhouette-index and dev(c), but using those last two indices, MOPSO-PN performs well.

Results and discussions of the second scenario
The MOPSO-PN algorithm was applied with 6 real-life datasets and 14 artificial datasets.
Comparisons are based on F-measure values and number of clusters (k) obtained with MOPSO-PN and its competitors. Table 5 shows F-measure values and numbers of clusters provided by each algorithm. Bests results (with higher values) are marked in bold.
These results illustrated in Table 5  The clustering results of the artificial datasets based on MOPSO-PN Clustering algorithm are presented in Figure 6: Sph-5-2 presented in Figure 6a, Sph-4-3 illustrated in Figure 6b, Sph-6-2 showed in Figure 6c, Sph-10-2 presented in Figure 6d, Sph-9-2 illustrated in Figure 6e, Pat 1 depicted in Figure 6f, Pat 2 illustrated in Figure 6g, Long 1 showed in Figure 6h, Sizes 5 presented in Figure 6i, Spiral showed in Figure 6j, Square 1 illustrated in Figure 6k, Square 4 depicted in Figure 6l, Twenty presented in Figure 6m and Fourty illustrated in Figure 6n.