4.2.1. Notation and Setting
Here we describe a generic protocol that reflects our current processes, and that enables an academic team to co-create libraries of BNs where police are led to their own Bayesian model to support their SEU decision making when pursuing a criminal in a way that all information that police need to keep secure remains undivulged. We first need to set up some notation.
Any particular incident whose progress is being monitored will involve a certain broad category of suspect (e.g. an IS sympathiser, their age and their history) which will inform the nature and speed of their progress through the different phases of a plot through the suspect’s intent, potential capabilities and MO. We denote this background information by . The academic team will be able to discover from open source publications how police may be able to categorise a given suspect, although the nature of such information may be covert and known only to police. A second classifier effectively concerns the environment in which the suspect operates, such as place of residence, which might affect the capability and ease of aborting a plot - information that is more easily exchanged across the firewall. Let denote this information. We expect that all CPTs will need to be indexed by , although most of these will be shared across categories . Because the academic team will only need to demonstrate their code within their own library, they typically need to only code up one such category. For this reason, we have suppressed this indexing in the development below. However, in the parallel library developed by police, they will need to elicit different entries for each CPT for each such category.
Denote by
an arbitrary library of fully embellished probability models - called
entries - where these entries have been ordered consistently with their arrival in the library. Here,
where
denotes the DAG of the BN and
- with
the vertex set of
- the collection of CPTs needed to embellish
into a full probability model,
. In [
4] and [
12], by defining an appropriate causal algebra, we demonstrate how the academic team can design the graphs
so that these provide a valid framework to describe not only how events might unfold not only when police do not intervene (decision
) but also when they do (decision
,
). So in this sense
will provide a suitable framework to describe the structure of
anyprobability model for all contemplated interventions and category of suspect-environment pair
.
Despite this useful structural invariance, in order to build a full probability model for each intervention, police will need to also specify for each category
the CPTs
associated with such decisions
- where we write
. Henceforth we focus on building the library for a fixed category
. To establish their library, police will then need to repeat the process below for all other categories of suspect
.
It is convenient to partition the sets
of CPTs into the three sets
, where
denotes those CPTs whose prior information and informative data sets can be shared with the academic team;
those CPTs for which the academic team have some information and perhaps data informing these but for which police have additional information; and
those CPTs which the academic team have only scant information about and which police would plan to overwrite with their own secret but much more reliable information. Let
denote those sets of CPTs for a new entry which have been elicited as different from any yet to appear in the library - i.e. for
, with
indexing the models in the library, define
Note that, for this construction to make sense, we have assumed we have labelled the vertices in
, such that vertices with the same index have the same meaning across different entries in the library - here across different criminal plots. In the protocol described below, academics will need to craft the naming of vertices so that these are as generic as possible so as to make the association across different entries in the libraries as fluid as they can be. We note that it is often necessary to revisit a generic naming of vertices so that the meaning of the vertices continues to apply to all entries (see e.g. [
14]). We also assume that this will mean that vertices, if they appear in two different graphs, will be ordered compatibly with each other. Thirdly, in the protocol defined below, we will assume that the graphs describing the criminal missions - here plots - have been chosen by academics so that these will be causal in the senses we have discussed above. This will mean that, for each category of crime, the graphs of the progress of the criminal mission will be respected before and after any intervention police might contemplate.
Because the libraries we construct have entries that describe similar crimes, logic often demands, or it is at least plausible, to assume that the dependence structures they express through the topology of the graphs within a library are shared. We have also argued that some of their CPTs will also be shared. It is therefore helpful the introduce a notation which can reflect these commonalities over the k models already located within the library. So let denote the graph with vertex set with a directed edge from to v in the edge set if and only if the edge lies in at least one of the edge sets , . Similarly, where , let for some denote the set of those CPTs that are shared by all graphs in the library . Note that a necessary condition for is that v has the same parents in each , so that all CPTs have the same dimensions. This notation sets up a way to find these common CPTs in a large library to aid the construction of priors for a new library entry.
Before the construction of the library begins, the academic co-creators train at least one of the police team so that they are able to elicit from colleagues the probabilities that might be needed for CPTs whose values must remain behind the firewall. This would typically encourage them to sign up for one of several open courses in probabilistic elicitation methods. Some of these methods include variations of the Delphi method (see e.g. [
15]), Cooke’s classical method [
16] and the IDEA protocol [
17], all of which involve asking groups of experts for their probabilistic judgements and evaluating these judgements over a number of stages. These tend to rely on the mathematical aggregation of experts’ judgements as a consensus is not usually naturally reached among the experts. However, an expert panel formed of members of police teams who are accustomed to working together are more likely to reach a consensus about probabilistic judgements surrounding an ongoing criminal plot. Therefore, elicitation methods focused on behavioural aggregation may be favoured over the aforementioned methods. The main elicitation technique of this kind is the Sheffield Elicitation Framework (SHELF) [
18] in which a facilitator guides the sharing of information and leads group discussions with the aim of reaching a consensus among experts. Far more detail on these elicitation methods can be seen in [
19,
20]. Members of the in-house police team now trained in such an elicitation method would then receive more customised training via the academic co-creation team. Such activities might involve the academic team engaging them in elicitations of the CPTs within the first iteration of the academic library then donated to police as a template of the model behind the firewall.
Denote the library built up by academics outside the firewall (police behind the firewall) on the iteration of development by , and use the same labelling convention for all entries and their pairs within these libraries. We write the existing entries in the libraries as , but these libraries may be empty. We are now able to describe a protocol for co-creating the next entry into this library.
4.2.5. Maintenance of the Library as Environments Change
Once an entry within the in-house library is functioning appropriately, it is likely to need to be maintained. This is because criminal processes and the ways they are most likely to be perpetrated evolve. However, because of the way these models have been structured, once a graphical model has been set up, it is relatively easy to maintain. So, for example, for typical plot models, the phases a criminal needs to traverse and the speed they can do this usually do not evolve quickly and these task probabilities only change at a moderate pace. The type of tell-tale signs measured by intensities do change quickly, of course, both in light of technological refinements and the evolving MO of various potential criminals. But these issues can be addressed simply through regular in-house refreshing of the CPTs and occasional small topological modifications of the BN, and police will have in-house specialists to facilitate this. Again, because the academic team have seeded the code, they can actively engage in this maintenance - providing new inputs to any open source components of the code and methodological advice about how to refresh all parts of the model.
Such maintenance is vital to perform regularly but is relatively resource cheap to the police once the initial system is in place. Again, because of the parallel development, much of the necessary updating can be delegated to the academic team and those elements that need to be securely protected can be routinely updated in-house, guided by the well-informed academic team.