Abstract
In the last years, we have witnessed a remarkable progress of algorithms solving Boolean satisfiability (SAT). The success of these algorithms has been especially relevant in a large number of industrial or real-world applications, for which these SAT solvers are nowadays an essential core part of their solving processes. Interestingly enough, these applications include a very diverse and heterogeneous range of domains, such as hardware verification, planning, and cryptography, among others. Unfortunately, the reasons of the good performance of these solvers on this variety of industrial benchmarks are not completely understood yet. Since SAT solvers’ efficiency is fundamental in various domains, obtaining a better understanding of these algorithms and the reasons of their good performance is crucial. In order to shed light on this question, SAT solvers are often viewed as complex systems with many interconnected components (e.g., conflict analysis and learning mechanism, database management, search restarts) interacting in many unpredictable ways. There is the common belief that the resulting emergent behavior of these complex systems takes advantage of a certain underlying structure of the SAT formula, which is shared by the majority of these industrial problems regardless the domain they come from. Recently, there have been some attempts of characterizing this structure under the lens of complex networks, with the purpose of better understanding the success of the solvers, and potentially improving them. In this paper, we analyze the structure of industrial SAT instances under the lens of self-similarity, and study how the execution of SAT solvers affect that structure. Many real-world graphs exhibit self-similar structure (with small fractal dimension), which means that after rescaling (replacing groups of nodes by a single node), the same kind of structure can be observed. In our analysis, in which we represent SAT instances as graphs, we observe that many industrial SAT formulas exhibit the same kind of structure. Moreover, we analyze how this structure evolves by effects of learning new clauses during the search. In particular, we observe that learned clauses usually contain variables that are close in the graph representation of the formula. This is, the learning mechanism tends to work locally. On the contrary, this learning mechanism on random SAT formulas –which do not exhibit any structure at all– is unable to generate these local clauses. This difference contributes to explain the success of modern SAT solvers on industrial problems.