Randomized heuristic for the maximum clique problem

A clique in a graph is a set of vertices that are all directly connected to each other i.e. a complete sub-graph. A clique of the largest size is called a maximum clique. Finding the maximum clique in a graph is an NP-hard problem and it cannot be solved by an approximation algorithm that returns a solution within a constant factor of the optimum. In this work, we present a simple and very fast randomized algorithm for the maximum clique problem. We also provide Java code of the algorithm in our git repository. Results show that the algorithm is able to find reasonably good solutions to some randomly chosen DIMACS benchmark graphs. Rather than aiming for optimality, we aim to find good solutions very fast.


Introduction
The maximum clique problem is one of the harder NP-hard problems in graph theory. It has applications in several domains like computer science, recommender systems and operations research. A clique is a complete sub-graph i.e. all vertices are connected to all other vertices. A maximum clique is a clique of maximum size. A graph may have more than one maximum cliques.
A clique is a lower bound on the chromatic number in a graph coloring problem.
The maximum clique problem is a well studied problem. [1] and [2] are good beginning articles to read about this problem. This work [3] provides a fast algorithm and presents results for several DIMACS graphs. This work [4] is a survey on maximum clique algorithms.
Rather than aiming at finding the maximum clique, this randomized algorithm tries to find large cliques that can be used further in other algorithms. For instance, this algorithm is used in [5] as a first step in an algorithm to color a graph, with really good results. We also present Java code in the following git repository [6]: If optimality is important, we refer the reader to the following implementation [7]: https://github.com/shah314/clique

Methods
A graph G(V, E) is a set of nodes V and a set of edges E. Each edge in E connects two vertices. A graph can be implemented in an adjacency matrix or an adjacency list format. We use a list format which makes the implementation more amenable to very large graphs.
We first read the DIMACS graph and then sort the vertices in decreasing order of the node degrees. A degree of a node is the number of nodes in its neighborhood. Then, we initialize a clique with the node with the highest degree. Then, in the immediate neighborhood of the first vertex, we complete the clique iteratively by adding vertices in the induced sub-graph of the reach of the nodes already in the clique in decreasing order. Call this initial clique gBest.
Then, we randomly remove 2 nodes from the clique and then greedily complete the clique. If the new clique is larger, update gBest. These can be called 2-opt moves.
Then, for each vertex in the graph, we perform 1-opt moves to improve the clique.
As the results section shows, the algorithm is able to find good solutions to randomly chosen DiMACS instances. 3 Results Table 1 shows the results of running the code on a random sample of DIMACS graphs. As the results show, the Java implementation is able to find reasonably good cliques, very fast, for almost all of the instances. The algorithm can be used to find good cliques for very large graphs as we use an adjacency list format of storing a graph.

Conclusion
In this work, we presented a simple fast randomized search heuristic to compute good large cliques in fairly large graphs. It is not the goal of this work to find maximum cliques. Rather, the goal is to find large maximal cliques for use in further applications. For example, [5] uses our algorithm as a first step in an algorithm to find the chromatic number of a graph, as the clique number can be seen as a lower bound on the chromatic number. We also present Java code of our algorithm which is simple and easy to use.