Categories capture the mathematical essence of composition, the process by which many parts make a whole. A category is thus determined by a collection of objects, denoted , and, for each pair of objects, a set of morphisms from a to b. We denote such a morphism by , and say it has sourcea and targetb. Morphisms with compatible source and target may be composed, so that and yield , and each object a is assigned an identity morphism, . The morphisms of a category are required to satisfy two axioms: unitality, saying ; and associativity, meaning we can simply write for consecutive composition. A basic example of a category is the category , whose objects are sets X and whose morphisms are functions .
5.1. `Polynomial’ generative models
At school, we learn about polynomial functions, such as
; a polynomial
functor is to this concept precisely what a functor is to a function. Formally, one merely changes the variables, coefficients and exponents in the expression from numbers to sets
1. In an expression such as
, we interpret the exponential
as the representable functor
,
as the product functor
, and + as the disjoint union of sets, so that all together, the expression encodes the functor
.
Every polynomial can be written in the form of a sum (disjoint union) of representable functors,
, for some indexing set
I and collection of exponents
; for example, we can write
as
, where 1 is the 1-element set
. Therefore, we will henceforth summarize the data of a polynomial
p as
where we now write
for the indexing set.
The mathematics of polynomial functors supplies a perhaps surprisingly rich formalism for describing interacting systems such as intelligent agents. We can think of a polynomial p as describing the `interface’ or `boundary’ of such a system: each element i of represents a possible shape or configuration that the system may adopt, or the possible actions that it may take; and each exponent represents the set of possible `inputs’ that it may expect (such as sense-data), having adopted configuration i.
Because the type of expected sense-data may depend on the configuration adopted (just as you don’t expect to `see` when you close your eyes), this generalizes the usual notion of a Markov blanket in active inference to something more dynamical. We can thus model an active inference agent with boundary polynomial p as predicting the activity of its boundary p. That is to say, we collect the exponent sets together into their disjoint union and then understand the agent as predicting a distribution over the whole set . This amounts to predicting both its configurations (hence, its actions) and, compatibly, its sense-data in each . If we restrict each to be the same (so there is no dependence of sense-data on configuration), then we can recover the standard Markov blanket: if we set to be the sense-data and the actions, then .
Being a category-theoretic formalism, one doesn’t just have objects (polynomials), but also morphisms between them: these encode the data of how agents with polynomial interfaces may interact; in particular, they encode how systems may be `nested’ within each other. Thus, a morphism encodes how a system with boundary p may be nested within a system with boundary q, and consists of a pair of a `forwards’ function (that encodes how p-configurations or p-actions are translated into q-configurations) and a family of `backwards’ functions (that encodes dually how q-sense-data is translated into p-sense-data). Polynomials and their morphisms collect into a category, .
Now, a morphism represents simply nesting a p-system within a q-system; but often, as here, we wish to consider how multiple agents form a coherent collective, which means we need a way to encode multiple agents’ polynomials as a single polynomial. For this, we can use the tensor of polynomial functors, , which places the two interfaces p and "side by side". Formally, we define as the polynomial . With this definition, we can understand a morphism as representing how systems p and come together to form a system with boundary q.
This is not yet enough for our purposes; we also wish to model systems that recursively predict the beliefs of other agents in their environment. Behaviorally, this means predicting how the other agents are going to act, given their perceptions—which in turn means predicting the patterns of interaction within the environment. And, formally, this means ’internalizing’ these patterns into a single polynomial.
Thus, given polynomials
p and
q, we can define the corresponding
hom polynomial
The set of configurations of
is the set of morphisms
, so to adopt a
-configuration is to adopt a particular pattern of interaction.
Dually, the “sense-data” associated to a particular pattern of interaction is given by the configurations of the `inner’ system p and, for each such configuration , the corresponding sense-data for the outer system q in the configuration implied by i via .
A prediction over
is thus a prediction over the set
that is, a distribution over patterns of interaction
, inner configurations
and outer sense data
. By way of example, if we assume that the outer system
q is `closed’ (with no further external environment), then that is to say that it has the trivial interface
with only one configuration (`being’) and no non-trivial sense-data. A morphism
corresponds to a function
2, which encodes how the environment responds with sense-data, given the
p-system’s actions. Thus, a prediction over
is a prediction of the environment’s response, along with a prediction of "how to act".
We can use this idea to extend the standard formalism of active inference agents’ internal generative models, by saying that a polynomial generative model for a solitary system with boundary p is a probability kernel (conditional probability distribution) for some choice of internal state space X, along with a prior distribution on X. This means that, for each , we obtain a distribution over : a belief about actions to take, along with a belief about how the environment will respond with sense-data.
When the system of interest is not solitary, however, it may sensibly imagine its environment to include other agents: thus, the "outer system" no longer has the trivial form y, but may itself be modelled as using a hom polynomial , where the are the polynomials representing the other agents. Thus, its generative model is extended to a probability kernel of the form . It is possible to prove an isomorphism of polynomials , and so this model is equivalent to one of the form .
Such an agent thus predicts not only its actions, but those of its companion agents, along with how the environment will respond to all of them. The foregoing isomorphism can be repeated to arbitrary levels of nesting, and thus constitutes a starting point for a formal "theory of mind"; and indeed, a starting point for an account of agents that model each other’s protentions.
5.2. A sheaf-theoretic approach to multi-agent systems
In the preceding section, we described how an ensemble of agents may predict each other’s behavior by instantiating a family of polynomial generative model. However, there is nothing in that formalism which pushes the agents’ beliefs to be in any way compatible: they need not share protentions. Indeed, a true collective of agents should be a group of agents that have `overlapping’ world models, sufficiently cohesive to promote the development of common intentions among individuals
In order to describe agents with such shared beliefs, we propose upgrading the formalism using the mathematical tools of sheaf and topos theory. Sheaves are in some sense the canonical structure for distributed data [
32], and tools from sheaf theory allow us to describe agents that communicate in order to reach a consensus [
33].
In more detail, a sheaf over a topological space constitutes a systematic method of keeping track of how `local’ data or qualities, defined on open subsets, can be reliably concatenated to represent a `global’ situation. This attribute renders them highly valuable in comprehending the varied and potentially contradictory convictions, perspectives, and forecasts of individual agents within a multi-agent system. Sheaves enable the depiction of both the diversity and agreement among various agents’ perspectives on the environment, and their ability to alter over time is crucial for adjusting and reacting to system modifications. Sheaves formalize the concept of “shared experience” among agents, which is essential for reaching a consensus on the structure of the external world.
Mathematically, a sheaf F is an assignment of data sets to a space X, such that the assignment “agrees on overlaps”, meaning that, if we consider overlapping subsets U and V of X, then and agree on the overlap . A little more formally, if we consider there to be a morphism whenever , we obtain a category whose objects are (open) subsets of X and whose morphisms are such (`opposite’) inclusions. Then a sheaf F is a functor such that, whenever U and V cover W (as when ) so that there are morphisms and in , then, if and , there is a unique such that . The category of sheaves on X forms a subcategory of the category of functors .
Now, a space such as
X is itself an object of a
category of spaces , whose morphisms are the appropriate kind of functions between spaces (
e.g., continuous functions between topological spaces), and when
has enough structure (such as when it is a
topos), there is an equivalence between
and the category
of
bundles over
X, whose objects are morphisms
in
and whose morphisms
are functions
such that
. We can use this equivalence
3 to lift the models of the previous section to the world of sheaves, as we now sketch.
A bundle thus may itself be seen as representing a type (or collection) of data that varies over the space X; for each , there is a fibre encoding the data relevant to x. In this way, each polynomial yields a `discrete’ bundle which maps to i. But the polynomials of the preceding section are in no way related to another ambient spatial structure: for instance, one might expect that the internal space X of an agent’s generative model is structured as a model of the agent’s external environment, which is likely spatial; likewise, the type of available configurations may itself depend on where in the environment the agent finds itself (consider that we might suppose this X to encode also task-relevant information).
This suggests that the agent’s configuration space
should itself be bundled over
X, so that the polynomial
p takes the form
as a morphism—or rather, as an object in
. In order for this to make sense, we need to be able to instantiate the category
in
, rather than
: but this is possible if
has enough structure
4, as we have assumed. Then, we can define a
spatial generative model on the interface
p over
X to be a probability kernel
in
,
i.e. that makes the following diagram commute, along with a prior on
X:
Such a kernel must therefore be a
stochastic section of the bundle
; and with it, an agent can make predictions according to where it believes itself to be in its configuration space, compatibly with the structure of that space.
Now in this spatially-enhanced setting, we may recapitulate the polynomial theory-of-mind of the preceding section, and suppose that each agent j is equipped with a spatial generative model of the form . If we additionally suppose that the collection of agents’ model spaces covers a (perhaps-larger) space Z, then we can in turn ask whether it is possible to glue these models together accordingly, to form a “sheaf of world models” W.
If it
is possible, then we may say that the agents inhabit a shared universe — and thus, with appropriate generative models, may be said to
share protentions. Conversely, if it is
not possible, then we may ask: what is the obstruction? In this case, there must be some
disagreement between the agents. But sheaf theory supplies tools for overcoming such disagreements [
34,
35], and thus to communicate to reach a consensus [
33]; even if the disagreements are fundamental, it is usually possible to derive dynamics that will yield as close to a sheaf as possible [
36]. In future work, we hope to apply these methods to multi-agent active inference model, in order to demonstrate this consensus-building.
5.2.1. A note on toposes
Sheaves collect into categories called
toposes. A topos is a category that has both spatial and logical structure [
37], allowing for the expression of logical propositions and deductions within it. Topos theory, extending beyond sheaf theory, provides a more holistic and abstract framework. Each topos is like a “categorified space”, and comes with an
internal logic and language, whose expressions are relative to the space that the topos models, thereby enabling a more profound exploration of the conceptual structures within these spaces. In this way, each topos can be thought of as a `universe’, where the truth of propositions may depend on where they are uttered. For example, the topos
assumed above represents “the universe of the space
X”, known as the “little topos” or “petit topos” of
X.
The tools of sheaf theory naturally extend to toposes. Thus, in the foregoing discussion, we considered a collection of agents with internal world models , which in turn induce toposes that we may consider gluing into a “shared universe” according to their topology or interaction pattern. Perhaps, in the end, we may consider this shared universe to be the agents’ understanding of their actual universe, socially constructed.