Preprint
Article

Probabilistic Cellular Automata Monte Carlo for the Maximum Clique Problem

Altmetrics

Downloads

46

Views

26

Comments

0

A peer-reviewed article of this preprint also exists.

This version is not peer-reviewed

Submitted:

30 August 2024

Posted:

30 August 2024

You are already at the latest version

Alerts
Abstract
We consider the problem of finding the largest clique of a graph. This is an NP-hard problem and no exact algorithm to solve it exactly in polynomial time is known to exist. Several heuristic approaches have been proposed to find approximate solutions, Markov Chain Monte Carlo being one of those. In the context of Markov Chain Monte Carlo, we present a class of ``parallel dynamics'', known as Probabilistic Cellular Automata, which can be used in place of the more standard choice of sequential ``single spin flip'' to sample from a probability distribution concentrated on the largest cliques of the graph. We perform a numerical comparison between the two classes of chains both in terms of the quality of the solution and in terms of computational time. We show that the parallel dynamics are considerably faster than the sequential one while providing solutions of comparable quality.
Keywords: 
Subject: Computer Science and Mathematics  -   Probability and Statistics

1. Introduction

A graph is a combinatorial structure consisting of a set of objects called vertices or nodes together with a collection of links, called edges, between pairs of vertices. Graphs may be used to model the relationship between pairs of entities of a population such as those between individuals in a social network or between data points in a dataset.
A typical question in decision theory is: “Given a graph G with set of vertices V, is there a subset A V with cardinality k such that all pairs of vertices in A are connected by an edge?”. This problem is known as the clique problem and is known to be NP-complete (it is, actually, one of the 21 problems identified by Karp in [1]).
Its formulation as an optimization problem “what is the largest clique of a given graph G” is known as the Maximum Clique Problem and is NP-hard.
Though it is easy to verify that a collection of k nodes is indeed a clique for a graph G (where “easy” means that this verification can be done in polynomial time - this is actually the definition of an NP problem), no polynomial time algorithm is known to exist to determine the size of the largest clique of a graph. For this reason, in many situations where computation time is limited, finding an optimal clique using an exact algorithm is usually not a viable option. In these cases, having an algorithm capable of finding a good heuristic solution in a short time may be extremely valuable.
Beyond the exact algorithms, which, in the general case, are not suitable for large instances, several approaches have been used to tackle the clique problem such as greedy procedures, Monte Carlo methods, machine learning and artificial intelligence tools, and quantum computing algorithms. A recent overview can be found here [2].
Here, our focus is on Monte Carlo methods. The idea behind Monte Carlo methods consists in the ability to sample collections of nodes from a probability distribution which puts most of the weight on the largest cliques of a given graph. This probability distribution is usually approximated by the long-term distribution of a suitable Markov Chain (see [3] for an elementary introduction).
In the context of the Maximum Clique problem, Markov Chain Monte Carlo (MCMC) methods have been analyzed extensively from both the theoretical and numerical point of view. Even though in [4] some of the limitations of MCMC have been highlighted, more recent results such as those in [5,6,7] showed that this approach may be worth tackling practical instances.
This work aims to evaluate, numerically, a class of Markov Chain Monte Carlo algorithms, which here we refer to as Probabilistic Cellular Automata Monte Carlo (PCAMC), where the Markov chain lives on some multidimensional discrete space and at each step, all the components of the “current state” are updated independently one of the other. This is in contrast with the usual choice in MCMC of updating only one component at each step of the dynamics. The inherent parallel nature of PCAMC allows for a straightforward implementation that takes full advantage of parallel computing architecture giving PCAMC a significant advantage over standard MCMC methods from the computational point of view.
These algorithms have already been considered (by the author of this paper and several coauthors) to tackle certain instances of the Quadratic Unconstrained Binary Optimization Problem (QUBO) and proved to be rather effective. Here we will use them in the context of the Maximum Clique Problem and we will show that their performances are quite promising.

2. Materials and Methods

In this section, we give a mathematical definition of the maximum clique problem and see how it can be formulated in terms of the minimization of a suitable objective function. We, then, describe the Monte Carlo algorithms we want to evaluate, namely the one we refer to as the PCA (and that throughout the paper we denote by A ) and the Shaken dynamics (denoted by S ) and the algorithm we take as a reference, that is, the Metropolis “single spin flip” algorithm (that we denote by M ).

2.1. Preliminary Definitions and Statement of the Problem

Let G = ( V , E ) be a non-oriented graph where V is the set of vertices and E V × V is the set of edges. A graph G is said to be complete if E = { { i , j } for all i , j V , i j } , that is if there as an edge between each pair of vertices of G. A graph G = ( V , E ) is called a subgraph of G if V V and E E . Let A V be a subset of the vertices of G. Then the graph G [ A ] with set of vertices A and set of edges E | A = { { i , j } : i , j A } (that is the set of edges of G with both endpoints in A) is called the subgraph of Ginduced by A. The subset C V is called a clique for G if the induced subgraph G [ C ] is complete. We denote by K ( G ) the set of all cliques of G. A clique C is said to be maximal for G if, for all C V such that C C , C is not a clique. In words, this means that C is maximal if it can not be extended by adding to C an adjacent vertex. Further, a clique C is called a maximum clique of G if | C | | C | for all C i n K ( G ) where | S | denotes the cardinality of set | S | . The cadinalitiy of a maximum clique of G is called the clique number of G and is denoted by ω ( G ) ...
Given a graph G, the maximum clique problem consists in determining the clique number ω ( G ) , that is, the size of the largest clique of the graph. As mentioned in the introduction, this problem is NP-hard.
A graph G = ( V , E ) can be encoded in several ways. One possibility is that of considering its adjacency matrix A G = ( a i j ) i , j = 1 N defined by a i j = 1 if { i , j } E and a i j = 0 otherwise. Note that A G is a symmetric matrix since G is non-oriented.

2.2. A Hamiltonian for the Clique Problem

A discrete optimization problem consists in finding the minimum of an objective function H ( η ) where η is an element of a suitable multidimensional space X .
Consider a graph G = ( V , E ) , with vertices V = { 1 , 2 , , N } and the space of boolean configurations with N components X = { 0 , 1 } N . Then, for each subgraph G [ A ] , with A V , there is one, and only one, η A X defined by η i A = 1 i A . We call the set of indices where η is positive the support of η end denoted by supp ( η ) (in words supp ( η ) is the set of vertices selected by the configuration η ). We can, therefore, identify subgraphs of a graph with N vertices and configurations of { 0 , 1 } N and. With an abuse of notation we write “the subgraph η ‘’ in place of the subgraph G [ A ] induced by the set A = { i 1 , , N : η i = 1 } . Similarly, if the vertices of subgraph η are a clique of G we refer to η as a clique of G. Moreover we write | η | for the cardinalitiy of a configuration η defined as | η | = | { i 1 , , N : η i = 1 } | .
For any graph G = ( V , E ) with N vertices it is possible to define the N × N matrix J = J ( G ) as J i j = 1 2 if { i , j } E and J i j = 0 otherwise. We call the matrix J ( G ) defined in this way the matrix of missing edges of G. Note that the matrix J is symmetric and is closely related to the adjacency matrix of the graph. In particular, if A = A ( G ) is the adjacency matrix of the graph, then J i j = 1 2 ( 1 A i j ) if i j and 0 otherwise. Therefore, any symmetric matrix J with zeroes on the diagonal and entries in 0 , 1 2 uniquely determines a graph G ( J ) . .Moreover, when it does not give rise to confusion we drop the explicit dependence of J on G.
Given a graph G = ( V , E ) consider the associated matrix of missing edges J and define the function
H J , λ ( η ) = i , j N J i j η i η j λ i = 1 N η i .
In the remainder of the paper, to lighten the notation, we will write H instead of H J , λ .
Lemma 1.
Let G = ( V , E ) be a non-oriented graph. If 0 < λ < 1 then the H is minimal if and only if η is a maximum clique of G.
Proof. 
Let us first show that the condition 0 < λ < 1 is sufficient.
Let η k be a configuration with cardinality k. Then H ( η k ) = λ k if η k and H ( η k ) > λ k otherwise since the term i , j N J i j η i η j > 0 for all η ’s that are not cliques (since there is at least one “missing edge”).
Let ω ( G ) be the clique number of G, that is, the cardinality of the largest clique of G. Then there exists a clique η ¯ with cardinality ω ( G ) such that H ( η ¯ ) = λ ω ( G ) . From the previous observation, it follows that H ( η ) λ ω ( G ) for all η ’s with cardinality at most ω ( G ) with equality only if η is a maximum clique for G.
It is left to show that, if the cardinality of η is larger than ω ( G ) , then H ( η ) > λ ω ( G ) . Let η have cardinality k > ω ( G ) . Write S = supp ( e t a ) and denote by A the maximum clique of G [ S ] . Further let B = S A and call η A and η B the configurations whose support is, respctively, A and B. Note that A and B are disjoint by construction and | B | 1 since, by assumption | A | + | B | > ω ( G ) and | A | ω ( G ) . We have
H ( η ) = H ( η A ) + H ( η B ) + i A , j B J i j η i η j .
Now observe that, by the assumed maximality of clique A, i A , j B J i j η i η j | B | since there must be at least one missing edge between each vertex in B and the vertices of A. Moreover, it must be H ( η A ) λ ω ( G ) and H ( η B ) λ | B | . Hence
H ( η ) λ ω ( G ) + ( 1 λ ) | B | λ ω ( G )
as soon as λ < 1 .
To see that the condition λ < 1 is necessary, consider the case where η is a clique of size k. Then H ( η ) = λ k . Let A be the set of indices such that η i = 1 (that is, the set of vertices of the clique). Consider the configuration τ obtained from η by setting η j = 1 for some j A and suppose that there is an edge between vertex j and k 1 vertices in A. Then H ( τ ) = λ ( k + 1 ) + 1 . Then H ( τ ) H ( η ) as soon as λ > = 1 .
Finally observe that if λ = 0 , H ( η ) = 0 for all η ’s that are a clique and if λ < 0 only the empty configuration 0 has energy H ( 0 ) = 0 whereas H ( η ) > 0 for all η 0 .
   □
Thanks to the previous lemma, the maximum clique problem can be formulated in a standard optimization problem form as
min η { 0 , 1 } N H ( η ) = min η { 0 , 1 } N i , j N J i j η i η j λ i = 1 N η i , 0 < λ < 1 .
Note that, exploiting the fact that η i = η i 2 , It is possible to rewrite the previous formula in terms of a matrix J ˜ = J ˜ ( λ ) as
min η { 0 , 1 } N H ( η ) = min η { 0 , 1 } N i j N J ˜ i j η i η j = min η { 0 , 1 } N η T J ˜ η
with J ˜ i j = J i j if i j and J ˜ i i = λ . This formulation is known in the literature as Quadratic Unconstrained Binary Optimization Problem (QUBO): an NP-hard problem with a vast range of applications (see the text [8] for an overview) that allows for a straightforward embedding of (beyond the maximum clique problem) of several combinatorial optimization problems such as max-cut and graph coloring (see [9]. Further, QUBO is a model of interest in the field of quantum computing (see, e.g., [10,11,12]),
It is worth noting that the objective function of an optimization problem can be interpreted as the Hamiltonian energy function of a statistical mechanics model with state space X . The statistical mechanics model’s ground states (minima of the Hamiltonian) correspond to configurations minimizing the objective function. The equilibrium distribution of a statistical mechanics model with Hamiltonian H is described by the Gibbs measure
μ G ( η ) = 1 Z e β H ( η ) , η X
where β is a positive real parameter called the inverse temperature and Z is a normalizing constant called the partition function. As β (low-temperature regime) this probability measure concentrates on the ground states of the system. This fact suggests that a solution to the optimization problem can be found by sampling from the Gibbs measure for the corresponding statistical mechanics model at low temperature.
To this aim, it is possible to define a Markov chain on X with transition probability P : X × X [ 0 , 1 ] whose stationary distribution is the Gibbs measure μ g . If the chain is irreducible and aperiodic, then, irrespective of the initial starting configuration, the probability distribution of the chain after n steps tends to the stationary distribution as n . The sampling from the Gibbs measure can, then, be performed by letting the chain evolve for a sufficiently long time. More precisely, let μ n ( η ) be the probability distribution after n steps of the chain starting from configuration η and let π be the stationary measure for the chain with transition probability P. Then, denoting by dTV ( · , · ) the total variation distance.
d TV ( μ n ( η ) , π ) 0 as n .
The time it takes for the chain to have a distribution that is within a distance ε from the equilibrium distribution is referred to as the mixing time of the chain. In formulae, the mixing time of a Markov Chain with state space X and stationary distribution π is defined, for ε < 1 2 , as
t mix ( ε ) = min { n : d TV ( μ n ( η ) , π ) ε }
In general, determining a priori bound on the mixing time of the chain that guarantees that the algorithm is useful in practice (that is it has a mixing time that is at most polynomial in the size of the problem) may be non-trivial. Actually, in the case of Erdős-Rényi random graphs, Jerrum ([4]) proved that the mixing time of the Metropolis process he considered for the Maximum Clique Problem has super-polynomial mixing time. However, for instances that are not too big (number of nodes up to a few thousand) Monte Carlo algorithms may still provide some good heuristic solutions (see, e.g., [5].

2.3. The Metropolis Single Spin Flip Algorithm

A reference choice among Markov Chain Monte Carlo algorithms is the Metropolis update rule with single spin flip whose transition probability matrix is as follows.
For a given graph G = ( V , E ) with missing edges matrix J, consider the Hamiltonian function H defined in (1). Let η ( i ) be the configuration obtained from η by flipping the occupation number of the i-th component, that is η i ( i ) = 1 η i and η j ( i ) = η j for all j i .
P M ( η , τ ) = 1 N e β [ H ( τ ) H ( η ) ] + . if τ = η ( i ) , i = 1 N 1 i 1 N e β [ H ( η ( i ) ) H ( η ) ] + if τ = η 0 otherwise
where [ · ] + denotes the positive part. In words, at each step, an index i 1 , , N is chosen a random and if the configuration obtained by flipping the value of the i-th component of η has a Hamiltonian energy lower than that of η the occupation number of η i is changed with probability one. If this is not the case, the occupation number of η i is changed with a probability that is exponentially small in the difference between the energy of the target configuration and the energy of the current one.
Let π M ( η ) = 1 Z e β H ( η ) . Then it is immediate to verify that P M satisfies the detailed balance condition
π M ( η ) P M ( η , τ ) = π M ( τ ) P M ( τ , η )
ensuring that π M is the stationary distribution of P M .
We remark that this algorithm is allowed to visit configurations that are not cliques of the graph G. However, in the limit β , if λ 1 β the algorithm is equivalent (see [5]) to the Metropolis algorithm considered by Jerrum in [4] which allows transitions only between cliques of G.

2.4. The “Pair Hamiltonian” Probabilistic Cellular Automton

A probabilistic cellular automaton on X is a Markov Chain whose transition probability matrix can be written, for any η , τ X as
P η , τ = i P ( τ i = · | η ) .
In words, this means that at each step, the value of each component τ i of the target configuration τ can be sampled independently of the values of all other τ j , j i . From a computational point of view this fact is particularly appealing, since, in principle, the update value of each component of τ could be demanded to a dedicated computing core (see [13]). With the advent of massively parallel processors such as GPUs, this is an actual possibility even on consumer hardware for configurations with several hundreds of components.
In [14] the PCA Monte Carlo algorithm has been introduced to study the minima of QUBO. When used to approach the maximum clique problem the algorithm is as follows.
Consider the pair Hamiltonian function defined on pairs of configurations
H ( η , τ ) = β i , j J ˜ i j η i τ j + q i [ η i ( 1 τ i ) + τ i ( 1 η i ) ] ,
where J ˜ is the matrix used in (5), and set the transition probability to be
P A ( η , τ ) = e H ( η , τ ) = e H ( η , τ ) τ e H ( η , τ )
Let h i ( η ) = j J ˜ i j η j be the local energy field felt by the i-th component of η . Then H ( η , τ ) can be rewritten as H ( η , τ ) = β i h i ( η ) τ i + q ( [ η i ( 1 τ i ) + τ i ( 1 η i ) ] ) . Consequently, the transition probabilities can be rewritten as
P A ( η , τ ) = i e β h i ( η ) τ i q [ η i ( 1 τ i ) + τ i ( 1 η i ) ] Z i
yielding
P ( τ i = 1 | η ) = e β h i ( η ) q ( 1 η i ) Z i and P ( τ i = 0 | η ) = e q η i Z i
with Z i = e β h i q ( 1 η i ) + e q η i . Therefore P X defines, indeed, a PCA in the sense of (10).
By the symmetry of J ˜ , a straightforward computation shows that the detailed balance condition
P A ( η , τ ) τ e H ( η , τ ) η , τ e H ( η , τ ) = P A ( τ , η ) η e H ( τ , η ) η , τ e H ( η , τ )
holds and, hence,
π A ( η ) = τ e H ( η , τ ) η , τ e H ( η , τ )
is the stationary measure of P A . Note that also this algorithm may visit all configurations and not only those that are cliques.
A closer look at equation (13) shows that at each step, the dynamics tends to select, for the target configuration τ , the vertices connected to all vertices selected by configuration η . Indeed, for these components, the local energy field is negative (equal to λ ). On the other hand, the probability of selecting, in τ , components for which there is at least one missing edge with the components selected by η is exponentially small (since λ < 1 ). Further note that the term e q [ η i ( 1 τ i + τ i ( 1 η i ) ) ] reduces the probability that the occupation number of the i-th component of τ differs from the i-th component of η so that the probability that η and τ differ at too many sites stays small.
To get an idea of why this algorithm is expected to work, it is possible to argue as follows. At first observe that H ( η , η ) = β H ( η ) with H defined as in LABEL_ABOVE. Further note that, as q gets large, the weight of pairs ( η , τ ) with η τ in π A ( η ) is exponentially depressed π A ( η ) becomes closer to the Gibbs measure e β H ( η ) Z with Hamiltonian H and inverse temperature β .
In [14] and in [15] the authors considered QUBO instances where the coefficients of the matrix J ˜ are realization of independent identically distributed random variables with expected value zero and showed that the PCA Monte Carlo performed quite well when compared to other heuristic techniques. Here, however, the elements of the matrix J ˜ have a particular structure: they are negative on the diagonal and positive or null outside of it and it is not, a priori, obvious how this structure could have an impact on the practical performances of the algorithm.

2.5. The Shaken Dynamics

Shaken dynamics have been introduced in [16,17] in the process of finding efficient algorithms to sample from the Gibbs measure on spin systems (see [18,19]) and extend the class of PCA with transition probabilities of type P A = e β H ( η , τ ) τ e β H ( η , τ ) described above and defined in terms of a pair Hamiltonian.
The transition probabilities of the Shaken dynamics come as a combination of two steps and are of the following type:
P S ( η , τ ) = σ P ( η , σ ) P ( σ , τ )
with
P ( η , σ ) = e H ( η , σ ) σ e H ( η , σ ) and P ( η , σ ) = e H ( η , σ ) σ e H ( η , σ ) .
The two functions H and H appearing in the definition of the transition probabilities must be such that H ( η , σ ) = H ( σ , η ) . If this is the case, then it follows that
π S ( η ) = σ e H ( η , σ ) η , σ e H ( η , σ )
is the stationary (reversible) measure for P S . To see this, observe that the detailed balance condition holds. Indeed, calling Z η = σ e H ( η , σ ) and Z = η , σ e H ( η , σ ) , we have
π S ( η ) P S ( η , τ ) = σ e H ( η , σ ) η , σ e H ( η , σ ) σ e H ( η , σ ) σ e H ( η , σ ) e H ( σ , τ ) τ e H ( σ , τ )
= σ e H ( η , σ ) η , σ e H ( η , σ ) σ e H ( σ , η ) σ e H ( η , σ ) e H ( τ , σ ) τ e H ( σ , τ )
= τ e H ( τ , σ ) τ , σ e H ( τ , σ ) σ e H ( τ , σ ) σ e H ( τ , σ ) e H ( σ , η ) σ e H ( σ , η ) = π S ( τ ) P S ( τ , σ )
Having defined the Shaken dynamics, we can describe how it can be used as a heuristic algorithm for the maximum clique problem.
Consider a graph G = ( V , E ) , with | V | = N , its associated missing edge matrix and J and the corresponding matrix J ˜ = J ˜ ( λ ) used to formulate the maximum clique problem as QUBO (as in (5)).
Let J be such that J + J T = J ˜ and define
H ( η , σ ) = β i , j N J i j η i σ j q i [ η i ( 1 σ i ) + σ i ( 1 η i ) ]
H ( η , σ ) = β i , j N J T i j η i σ j q i [ η i ( 1 σ i ) + σ i ( 1 η i ) ] .
Then the condition H ( η , σ ) = H ( σ , η ) is verified. Further, observe that
H ( η , η ) = η T J ˜ η = η T ( J + J T ) η = 2 ( η T J η ) = 1 2 β H ( η )
where H ( η ) is the same the one defined in (5).
Strictly speaking, the Shaken dynamics is not a PCA. However the transition probabilities of each half-step are of the form (13) where, instead of the vector of field h, appear, alternatively, the two vectors of fields h = J · η and h = J T · σ .
Also in this case, if the parameter q is large, the stationary measure π S ( η ) is close to the Gibbs measure e β H ( η ) Z (the factor e 1 / 2 appears both at the numerator and at the denominator and, therefore, cancels out) and, hence, the Shaken dynamics provides a heuristic algorithm for the maximum clique problem. Also this Markov chain lives on the whole set X and not only on the subsets corresponding to cliques.
In [17] the Shaken dynamics has been used, in the context of spin systems, to find the minima of a QUBO-like problem defined on a lattice that is a problem where the interaction J i j was different from zero only for pairs i , j satisfying a certain relation. In that scenario, the Shaken dynamics appeared to be rather effective. However, in the case of the maximum clique problem, besides the requirement for J to be symmetric, there is no restriction on which (and how many!) pairs J i j can be different from zero as long as i j . Therefore the effectiveness of this approach to the maximum clique problem has to be assessed.

3. Results

We tested our algorithms on a set of benchmark graphs commonly considered in the literature (DIMACS benchmarks). The graph considered have a different number of nodes N spanning from 125 to 4000. The graphs taken into account are of several types which can be retrieved from the graph id:
Erdős-Rényi graphs:
graphs with a fixed number of vertices and edges selected independently at random with uniform probability. These are graphs with id of type Cn.d (where n is the number of vertices and d/10 is the density (probability) of edges) and DSJCn_d (where n is the number of nodes and d/10 is the density of edges)
Steiner triple graphs:
with id MANN_aXX are graphs constructed to have a clique formulation of the Steiner Triple Problem
Brockington graphs:
with id brockN_d where N is the number of vertices and d is a parameter
Sanchis graphs:
with id genn_p0.x_y where n is the number of vertices, p0.x is the density of edges and y is the size of the planted clique
Hamming graphs:
with id hammingx-y with parameters x and y
Keller graphs:
with id kellern and parameters n = 4, 5, 6
P-hat graphs:
with id p-hatn-x where n is the number of nodes and x is an identifier; graphs generated with p-hat have wider node degree spread and larger cliques than uniform graphs.
More details can be found in [20].
We formulated the problem as in (5) and fixed, for all graphs and all algorithms, λ = 0.25 .
As far as the parameters β and q, since we do not have an apriori argument for an optimal choice, we tested several pairs of values. However, to determine the values of q to perform our test, we kept into account the size N (number of nodes) of the graph and chose values of q so as to have e q N = c q N for several values of c q as described below. The rationale for this choice is the fact that, for A and S, the term e q is proportional to the probability of “flipping” a component neglecting the contribution depending on the “energy field” for that component. In this way we intended to change, at each iteration, a number of component of the order N .
The values we used are the following:
Metropolis ( M ):
β in the range 1 , 1.5 , 2 , , 4
PCA ( A ):
β in the range 1 , 1.5 , 2 , 4 and c q in the set 1 8 , 1 4 , 1 2 , 1 , 2 , 3 , 4
Shaken dynamics ( S ):
in the range 1 , 2 , 7 and c q in the set { 1 64 , 1 32 , 1 16 , 1 8 , 1 4 , 1 2 , 1 1 }
These sets of ranges for the parameters have been determined after some experimentation where we saw that certain ranges of parameters were not suitable for a given algorithm. In particular, we needed to find values of q big enough (that is c q sufficiently small) for the dynamics to retain a few components. Note that not all pairs of parameters ( β , q ) that have been taken into account are effective for a given algorithm.
For the Shaken dynamics S , we also had to make a choice for the matrix J . We proceeded as follows. For each i 1 , , N we set
J i j = J i j with j = ( ( j 1 ) mod N ) + 1 for j ( i + 1 , i + 2 , i + N 2 1 ) J i j 2 for j = i + N 2 if N is even 0 otherwise
Intuitively, for each row, the matrix J is obtained retaining the N 2 elements of J ˜ “on the right” of the diagonal element. It is straightforward to verify that J + J T = J ˜ .
This choice also provides a reason for why the values β used for the algorithm S are in a range that is, roughly, twice as wide then that used for M and A: the fields used to update the configuration at each “half-step” are proportional to J · η whose components could be expected to be approximately one-half of the components of J · e t a (which are the fields determining the transitions of A ).
Simulations were run for a fixed number of iterations and the largest clique found in each run of any of the algorithms was recorded. For each set of parameters ( β for M and ( β , q ) for A and S ) we run 10 simulations. The number of iterations has been set in the following way:
Metropolis ( M ):
100000 sweeps where a sweep corresponds to N steps of the Markov Chain (that is a total of 100000 N “attempted” flips);
PCA ( A ):
200000 steps of the Markov Chain;
Shaken dynamics ( S ):
100000 complete steps of the Markov Chain (200000 half-steps).
Our intent was to perform a fair comparison of the three algorithms. However it is not obvious what the right criterion to establish fairness should be. For instance, one could run the chains for the same number of “nominal” steps. However, this approach would be rather penalizing for the Metropolis algorithm since it would take, on average, O ( N log N ) steps to give all vertices of the graphs at least one opportunity to be selected. Another opportunity would be to count the number of “attempted flips”. Since in both A and S all components are potentially updated, following this approach would require setting the number of steps for both A and S to be the same of the number of sweeps for M . However, if run on a single core, the computing time for a step of A of S is significantly shorter than the time required for a sweep of M (see Table 1). Finally one could simply consider computation time. However, we feel that this criterion would be too dependent on the implementation details and the features of the computing machine. Because of these considerations, we decided to use a “hybrid” approach using as a primary criterion the total number of attempted flips but giving an extra quota of iterations to the parallel dynamics (note that, in each full step, the Shaken dynamics tries to update all component twice) to take into account (in a conservative way) their faster execution times.
The three algorithms were implemented in Julia (version 1.10, see [21]) and executed on an Nvidia DGX-1 system equipped with Intel Xeon CPU E5-2698 v4 @ 2.20GHz (80 cores). We considered only a CPU implementation of the algorithms. For linear algebra operations (matrix-vector products) we used BLAS libraries ([22,23]) with the default implementation shipped with the LinearAlgegra package of Julia. Random numbers were generated using the default random number generator available in Julia (which in version 1.10 implements the Xoshiro256++ algorithm [24]).
The Metropolis algorithm finds the best-known clique 30 out of 37 times proving to be, in practice, a rather effective tool for the maximum clique problem for instances with a size that is not too large. The PCA algorithm A allowed to find the maximum clique 27 out of 37 times whereas the Shaken dynamics found the maximum clique 23 out of 37 times. In those cases where neither of the algorithms found the maximum clique the Metropolis is the one which found the largest clique more often. It should be noted, though, that in the case of the graph MANN_a81, the PCA found a better solution of the Metropolis and in two more cases it found an equivalent solution.
In general, the performances of the two parallel algorithms, especially those of the PCA A , are not much worse in terms of the value of the largest clique found. Nevertheless, it is possible to argue that the slightly worse performances of A and S are partly compensated by their computational effectiveness. Even using a single thread for the computation of the energy field, algorithm A was faster between 5 and 40 times. Running with 8 threads the speedup reached a factor beyond 100 for certain instances and it was more than 50 the majority of times. The situation is not too different for the Shaken dynamics S . Here one has to take into account that, at each step, two configurations are visited: one for each half-step. A GPU implementation of the two algorithms is expected to yield an even greater speed up, especially on the larger instances. The increased speed could be advantageous in applications.
Looking at the two parallel algorithms, A appears to be better than S both in terms of the value of the solution provided and in terms of speed (even though the Shaken dynamics explores two configurations at each step the time to get a new configuration with the PCA appears to be less than 1 2 of the time required by the shaken dynamics). Likely this can be explained by the fact that the energy fields of the PCA can be computed as the J ˜ · η matrix-vector product where the matrix J ˜ is symmetric, whereas the matrix J used to compute J · η is not.

4. Discussion

We believe that the results presented above foster a further investigation into the behavior of Probabilistic Cellular Automata. Indeed, they appear to provide good results while taking great advantage of the features of the computing architectures available nowadays. This investigation should start, in our opinion, from a better understanding of the relation between the stationary measure of the PCA and the Gibbs measure. In particular, it would be interesting to determine, for any β ¯ , bounds for the parameters ( β , q ) so that the total variation distance between π A (or π S ) and the Gibbs measure at inverse temperature β ¯ is within a prescribed value ε .
In the previous section, we observed that the values of the largest clique obtained with the Metropolis algorithm M are slightly better than those obtained with A and S . It should be mentioned that the dependence on two parameters of the latter algorithms makes their tuning more challenging than the tuning of the single parameter of the Metropolis. For this reason, it would be interesting to study, from a theoretical point of view, the behavior of both dynamics in the ( β , q ) plane. In particular, it would be interesting to determine a “phase transition” curve identifying a “high temperature” region where the stationary measure is “close” to the uniform measure on X and a “low temperature” where the stationary measure is “concentrated” on the minima of the Hamiltonian 1. It is to be expected that this phase transition curve will depend on the law with which the graph has been generated An analog study would be interesting for the Metropolis case where it would be interesting to determine a critical value of β separating the high and low-temperature regions.
As already mentioned, the Metropolis algorithm discussed above is equivalent, at large inverse temperature, to the algorithm described by Jerrum in [4]. In that paper, it has been shown that the mixing time of the chain, in the case of Erdős-Rényi graphs, is not polynomial in the number of vertices. Further, the limiting typical size of clique typically determined by the Metropolis process has been determined. It would be interesting to perform a similar analysis for the two parallel algorithms introduced here.
From the computational point of view, the most expensive task of the PCA and the Shaken dynamics is the computation of the vector of fields which is of the type h ( η ) = J · η . This matrix-vector product can be performed using highly optimized linear algebras libraries, which may take great advantage of multicore processors and GPUs.
As a consequence, the algorithm will benefit for free from improvements in the performances of linear algebra libraries. Since the use of these libraries is ubiquitous in computer applications, improvements in this direction will be driven by the efforts of communities coming from the most diverse domains.
Note this type of advantage could be exploited even further. Indeed, it is possible to let k PCA evolve in parallel. Denoting by E the matrix whose columns are the configurations of the k chains, the collection of the k vector of fields { h i ( η ( m ) ) } i = 1 , , N ; m = 1 , , k can be computed as the matrix matrix product J ˜ · E . Performing this operation is, generally, significantly faster than the computation of k matrix-vector products. Note that, for each chain, a different pair of parameters β and q could be used without significantly affecting the computation time.
The possibility to run multiple chains at the time with different inverse temperatures β makes it possible to consider both dynamics A and S for algorithms like parallel tempering [25] which has been applied to the clique problem using standard single spin Metropolis dynamics (see, e.g. [7]).
Note that Markov Chain Monte Carlo algorithms defined in terms of a “pair Hamiltonian” were already considered in [5,26,27]. Also, the algorithm considered in those works (referred to as “Cavity”) had the feature of updating more than one component of the configuration at each step. However, in these works, the cardinality of the configuration stayed fixed throughout the whole evolution of the chain and, therefore, its components could not be updated independently limiting, in this way, the possibility to exploit parallel computing architectures at their fullest.
As a final comment, we would like to mention that both the parallel algorithms discussed in this paper may have applications outside combinatorial optimization. As far as the PCA is concerned, it has been suggested in [28] that this Markov Chain could be used to sample Exponential Random Graphs in the generation of Synthetic Power Grids. For what concern the shaken dynamics, it has been hinted in [17,29] has a microscopic for tidal dissipation.

Funding

This work has been partially supported by PRIN 2022 PNRR: "RETINA: REmote sensing daTa INversion with multivariate functional modeling for essential climAte variables characterization" (Project Number: P20229SH29, CUP: J53D23015950001) funded by the European Union under the Italian National Recovery and Resilience Plan (NRRP) of NextGenerationEU, under the Italian Ministry of University and Research (MUR).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

We downloaded the instances of the graphs analyzed in this paper from https://iridia.ulb.ac.be/~fmascia/maximum_clique/ Information on the generators used to create the instances and their source code can be found at DIMACS ftp reachable at http://archive.dimacs.rutgers.edu/pub/challenge/graph/.

Acknowledgments

The author thanks the Department of Mathematics and Physics of the University of Rome "Tre" which provided computing resources.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MCMC Monte Carlo Markov Chain
PCA Probabilistic cellular automaton or Probabilistic cellular automata
PCAMC PCA Markov Chain
QUBO Quadratic Unconstrained Binary Optimization

References

  1. Karp, R.M. Reducibility among Combinatorial Problems; Springer US: Boston, MA, 1972; pp. 85–103. [Google Scholar]
  2. Marino, R.; Buffoni, L.; Zavalnij, B. A Short Review on Novel Approaches for Maximum Clique Problem: from Classical algorithms to Graph Neural Networks and Quantum algorithms. arXiv preprint 2024, arXiv:2403.09742 2024. [Google Scholar]
  3. Häggström, O. Finite Markov chains and algorithmic applications; Cambridge University Press, 2002; Vol. 52. [Google Scholar]
  4. Jerrum, M. Large cliques elude the Metropolis process. Random Structures & Algorithms 1992, 3, 347–359. [Google Scholar]
  5. Iovanella, A.; Scoppola, B.; Scoppola, E. Some spin glass ideas applied to the clique problem. Journal of Statistical Physics 2007, 126, 895–915. [Google Scholar] [CrossRef]
  6. Montanari, A. Finding one community in a sparse graph. Journal of Statistical Physics 2015, 161, 273–299. [Google Scholar] [CrossRef]
  7. Angelini, M.C. Parallel tempering for the planted clique problem. Journal of Statistical Mechanics: Theory and Experiment 2018, 2018, 073404. [Google Scholar]
  8. Punnen, A.P. The quadratic unconstrained binary optimization problem. Springer International Publishing 2022, 10, 978–3. [Google Scholar]
  9. Glover, F.; Kochenberger, G.; Du, Y. Quantum Bridge Analytics I: a tutorial on formulating and using QUBO models. Annals of Operations Research 2019, 314, 141–183. [Google Scholar] [CrossRef]
  10. Baioletti, M.; Santini, F. Abstract Argumentation Goes Quantum: An Encoding to QUBO Problems. Pacific Rim International Conference on Artificial Intelligence. Springer, 2022, pp. 46–60.
  11. Glover, F.; Kochenberger, G.; Ma, M.; Du, Y. Quantum Bridge Analytics II: QUBO-Plus, network optimization and combinatorial chaining for asset exchange. Annals of Operations Research 2022, 314, 185–212. [Google Scholar] [CrossRef]
  12. Tasseff, B.; Albash, T.; Morrell, Z.; Vuffray, M.; Lokhov, A.Y.; Misra, S.; Coffrin, C. On the emerging potential of quantum annealing hardware for combinatorial optimization. Journal of Heuristics.
  13. Fukushima-Kimura, B.H.; Handa, S.; Kamakura, K.; Kamijima, Y.; Kawamura, K.; Sakai, A. Mixing time and simulated annealing for the stochastic cellular automata. Journal of Statistical Physics 2023, 190, 79. [Google Scholar] [CrossRef]
  14. Scoppola, B.; Troiani, A. Gaussian Mean Field Lattice Gas. Journal of Statistical Physics 2018, 170, 1161–1176. [Google Scholar] [CrossRef]
  15. Isopi, M.; Scoppola, B.; Troiani, A. On some features of Quadratic Unconstrained Binary Optimization with random coefficients. arXiv preprint 2024, arXiv:2402.17059 2024. [Google Scholar] [CrossRef]
  16. Apollonio, V.; D’autilia, R.; Scoppola, B.; Scoppola, E.; Troiani, A. Criticality of Measures on 2-d Ising Configurations: From Square to Hexagonal Graphs. Journal of Statistical Physics 2019, 177, 1009–1021. [Google Scholar] [CrossRef]
  17. Apollonio, V.; D’Autilia, R.; Scoppola, B.; Scoppola, E.; Troiani, A. Shaken dynamics: an easy way to parallel Markov Chain Monte Carlo. Journal of Statistical Physics 2022, 189, 39. [Google Scholar] [CrossRef]
  18. D’Autilia, R.; Andrianaivo, L.N.; Troiani, A. Parallel simulation of two-dimensional Ising models using probabilistic cellular automata. Journal of Statistical Physics 2021, 184, 1–22. [Google Scholar] [CrossRef]
  19. Scoppola, B.; Troiani, A.; Veglianti, M. Shaken dynamics on the 3d cubic lattice. Electronic Journal of Probability 2022, 27, 1–26. [Google Scholar] [CrossRef]
  20. Johnson, D.S.; Trick, M.A. Cliques, coloring, and satisfiability: second DIMACS implementation challenge, October 11-13, 1993; Vol. 26, American Mathematical Soc., 1996.
  21. Bezanson, J.; Edelman, A.; Karpinski, S.; Shah, V.B. Julia: A fresh approach to numerical computing. SIAM review 2017, 59, 65–98. [Google Scholar] [CrossRef]
  22. Lawson, C.L.; Hanson, R.J.; Kincaid, D.R.; Krogh, F.T. Basic linear algebra subprograms for Fortran usage. ACM Transactions on Mathematical Software (TOMS) 1979, 5, 308–323. [Google Scholar] [CrossRef]
  23. Dongarra, J.J.; Croz, J.D.; Hammarling, S.; Hanson, R.J. An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Softw. 1988, 14, 1–17. [Google Scholar] [CrossRef]
  24. Blackman, D.; Vigna, S. Scrambled linear pseudorandom number generators. ACM Transactions on Mathematical Software (TOMS) 2021, 47, 1–32. [Google Scholar] [CrossRef]
  25. Hukushima, K.; Nemoto, K. Exchange Monte Carlo method and application to spin glass simulations. Journal of the Physical Society of Japan 1996, 65, 1604–1608. [Google Scholar] [CrossRef]
  26. Viale, M. Il problema della massima clique: teoria & pratica. PhD thesis 2009. [Google Scholar]
  27. Gaudilliere, A.; Scoppola, B.; Scoppola, E.; Viale, M. Phase transitions for the cavity approach to the clique problem on random graphs. Journal of statistical physics 2011, 145, 1127–1155. [Google Scholar] [CrossRef]
  28. Giacomarra, F.; Bet, G.; Zocca, A. Generating Synthetic Power Grids Using Exponential Random Graph Models. PRX Energy 2024, 3, 023005. [Google Scholar] [CrossRef]
  29. Pinzari, G.; Scoppola, B.; Veglianti, M. Spin orbit resonance cascade via core shell model. Application to Mercury and Ganymede. arXiv preprint 2024, arXiv:2402.07650 2024. [Google Scholar]
Table 1. This table shows the speedups with respect to Metropolis algorithm of the A and S algorithms. The speedup is computed taking into account the execution time of 1000 iterations of each algorithm (sweeps for M and steps for A and S . Computation times are determined by computing averages over 5 runs. For each of the two parallel algorithms, columns with headings 1, 2, 4, and 8 give the speedup when the number of threads used by the BLAS library is set to 1, 2, 4, and 8 respectively. Column N gives the number of nodes in the graph.
Table 1. This table shows the speedups with respect to Metropolis algorithm of the A and S algorithms. The speedup is computed taking into account the execution time of 1000 iterations of each algorithm (sweeps for M and steps for A and S . Computation times are determined by computing averages over 5 runs. For each of the two parallel algorithms, columns with headings 1, 2, 4, and 8 give the speedup when the number of threads used by the BLAS library is set to 1, 2, 4, and 8 respectively. Column N gives the number of nodes in the graph.
graph id N A S
1 2 4 8 1 2 4 8
C1000.9 1000 10.2 18.3 127.0 177.2 3.1 5.6 46.7 50.5
C125.9 125 8.2 8.0 8.0 8.0 3.3 3.7 4.0 3.4
C2000.5 2000 7.7 14.4 25.8 48.6 2.2 4.1 7.6 12.2
C2000.9 2000 5.7 7.0 12.0 19.6 1.6 2.0 3.6 6.1
C250.9 250 22.3 24.5 16.5 18.3 9.0 8.6 6.9 7.9
C4000.5 4000 5.5 6.8 12.6 22.3 1.5 1.9 3.6 6.0
C500.9 500 37.9 60.3 84.8 107.3 14.3 21.7 31.0 42.3
DSJC1000_5 1000 6.0 10.7 85.0 128.9 1.8 3.3 6.3 44.8
DSJC500_5 500 30.9 47.1 66.4 82.9 11.2 16.7 24.3 32.6
MANN_a27 378 28.7 41.1 55.6 66.4 10.6 15.6 21.5 26.4
MANN_a45 1035 6.1 10.8 87.4 126.6 1.8 3.3 6.2 23.7
MANN_a81 3321 6.3 12.0 22.3 40.3 2.5 3.4 6.6 12.4
brock200_2 200 29.0 27.0 33.3 36.1 11.7 13.8 13.5 14.5
brock200_4 200 29.9 27.4 33.3 36.3 12.4 10.5 13.0 13.9
brock400_2 400 33.7 31.8 43.1 56.2 12.5 17.1 16.6 21.0
brock400_4 400 33.0 43.2 42.8 54.4 12.4 17.0 15.8 20.9
brock800_2 800 38.1 59.7 57.0 79.8 12.1 20.0 20.3 29.4
brock800_4 800 37.5 59.4 54.8 79.6 1.8 19.6 20.3 29.4
gen200_p0.9_44 200 27.6 25.8 31.9 34.4 11.5 13.8 13.0 14.5
gen200_p0.9_55 200 29.5 35.6 32.0 34.2 11.8 14.1 13.3 14.4
gen400_p0.9_55 400 33.6 31.5 38.0 48.8 13.5 12.4 16.1 22.4
gen400_p0.9_65 400 35.7 30.2 49.0 58.3 12.9 14.6 14.9 21.7
gen400_p0.9_75 400 34.0 46.0 45.6 58.1 13.3 16.0 17.3 22.4
hamming10-4 1024 5.8 66.1 66.6 94.5 1.8 3.5 21.4 32.7
hamming8-4 256 25.0 28.8 32.1 35.4 7.6 10.8 12.5 15.0
keller4 171 22.6 23.3 23.2 23.2 9.0 10.6 10.3 11.3
keller5 776 39.1 59.4 58.9 84.3 13.1 15.8 19.5 28.5
keller6 3361 5.5 6.8 12.6 23.0 1.4 2.6 3.5 6.6
p_hat1500-1 1500 5.7 10.3 102.4 94.4 1.7 3.1 5.6 6.2
p_hat1500-2 1500 5.8 10.7 98.0 108.4 2.7 18.5 35.4 41.9
p_hat1500-3 1500 15.3 26.7 38.1 49.7 4.6 7.5 11.4 14.5
p_hat300-1 300 13.4 12.7 17.5 8.3 5.4 6.2 7.8 8.7
p_hat300-2 300 14.2 20.6 25.8 17.6 4.6 7.6 7.2 8.8
p_hat300-3 300 12.1 12.9 16.3 18.2 5.0 5.2 6.7 7.9
p_hat700-1 700 15.6 22.8 28.4 31.1 5.1 7.2 9.4 11.1
p_hat700-2 700 15.1 23.3 30.6 28.0 5.2 6.0 9.7 11.4
p_hat700-3 700 16.9 22.6 28.8 32.9 5.7 7.9 8.3 11.8
Table 2. This table shows the largest clique found with each algorithm for the examined benchmark graphs. Bold numbers highlight those cases where the cardinality of the largest clique found by any of the algorithms considered in this paper coincides with the clique number of the graph or its best lower bound available in the literature (to the best of our knowledge). Underlined numbers signal the largest clique obtained with any of the algorithms considered in the paper when this number is smaller than the clique number of the graph (or its best lower bound). Column A gives the results for the Probabilistic Cellular Automaton, column S the results for the Shaken dynamics, and column M the results of the Metropolis single flip dynamics. Column ω ( G ) gives the value of the clique number of the graph (or its best lower bound). The value of ω ( G ) is marked with an asterisk when it is known to be exact (see [2]). Column N gives the number of nodes in the graph.
Table 2. This table shows the largest clique found with each algorithm for the examined benchmark graphs. Bold numbers highlight those cases where the cardinality of the largest clique found by any of the algorithms considered in this paper coincides with the clique number of the graph or its best lower bound available in the literature (to the best of our knowledge). Underlined numbers signal the largest clique obtained with any of the algorithms considered in the paper when this number is smaller than the clique number of the graph (or its best lower bound). Column A gives the results for the Probabilistic Cellular Automaton, column S the results for the Shaken dynamics, and column M the results of the Metropolis single flip dynamics. Column ω ( G ) gives the value of the clique number of the graph (or its best lower bound). The value of ω ( G ) is marked with an asterisk when it is known to be exact (see [2]). Column N gives the number of nodes in the graph.
graph id N A S M ω ( G )
C1000.9 1000 67 65 68 68
C125.9 125 34 34 34 34
C2000.5 2000 16 15 16 16*
C2000.9 2000 73 72 76 80
C250.9 250 44 44 44 44
C4000.5 4000 17 16 17 18*
C500.9 500 57 57 57 57
DSJC1000_5 1000 15 14 15 15*
DSJC500_5 500 13 13 13 13*
MANN_a27 378 124 123 124 126*
MANN_a45 1035 335 334 336 345*
MANN_a81 3321 1084 1080 1083 1100*
brock200_2 200 12 12 12 12*
brock200_4 200 17 17 17 17*
brock400_2 400 29 25 29 29*
brock400_4 400 33 32 33 33*
brock800_2 800 20 20 21 24*
brock800_4 800 20 20 21 26*
gen200_p0.9_44 200 44 44 44 44
gen200_p0.9_55 200 55 55 55 55
gen400_p0.9_55 400 55 55 55 55
gen400_p0.9_65 400 65 65 65 65
gen400_p0.9_75 400 75 75 75 75
hamming10-4 1024 40 40 40 40
hamming8-4 256 16 16 16 16
keller4 171 11 11 11 11*
keller5 776 27 27 27 27*
keller6 3361 55 55 59 59*
p_hat1500-1 1500 12 11 12 12*
p_hat1500-2 1500 65 65 65 65*
p_hat1500-3 1500 94 94 94 94*
p_hat300-1 300 8 8 8 8*
p_hat300-2 300 25 25 25 25*
p_hat300-3 300 36 36 36 36*
p_hat700-1 700 11 11 11 11*
p_hat700-2 700 44 44 44 44*
p_hat700-3 700 62 62 62 62*
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated