In addition to the perception module, we have developed an adaptive learning module that enhances the AV’s ability to respond to its surroundings. This module continuously analyzes the agent’s interactions with the environment and adapts the AV’s responses accordingly by acquiring incremental knowledge of the evolving context. This approach ensures the AV can proactively anticipate changes, make better decisions, and respond adeptly. Integrating the adaptive learning module represents a significant step forward in promoting an adaptive interaction between the AV and its surroundings. The module comprises two components: the World Model and the Active Model.
3.2.1. World Model
The World Model (WM) acts like a simulator in the brain, providing insights into how the brain learns to execute sensorimotor behaviors [
36]. In the proposed architecture, the WM is formulated using generative models, leveraging interactive experiences derived from multimodal sensory information. Initially, the WM is established by the Situation Model (SM), serving as a fundamental input module (see
Figure 2). The SM, represented as a Coupled GDBN (C-GDB), models the motions and dynamic interactions between two entities in the environment, enabling the estimation of the vehicles’ intentions through probabilistic reasoning. This constructed C-GDBN demonstrates the gathered sub-optimal information concerning the interaction of an expert AV (
) with one vehicle (
) in the same lane, where
changes lanes to overtake
without a collision (comprehensive details on structuring the SM can be found in our previous work [
37]). To initialize the WM by using the provided SM, we transfer the knowledge of
to a First-Person perspective, where an intelligent vehicle
learns by interacting with its environment via observing the
’s behaviour to integrate the gain knowledge into its understanding of the surroundings.
The First-Person model (FP-M) establishes a dynamic model that shifts
from a third-person viewpoint to a first-person experience. This allows
to perceive driving tasks as
does, enhancing its imitation accuracy. Such a perspective empowers
to react promptly during interactions with
. FP-M’s structure is derived by translating the hierarchical levels of SM into the FP context (as illustrated in
Figure 3). The top level of hierarchy in FP-M denotes pre-established configurations
from the dynamic behaviour of how
and
interact in the environment. Each configuration represents a joint discrete state as:
where
is a latent discrete state evolving from the previous state
by a non-linear state evolution function
representing the transition dynamic model and by a Gaussian process noise
. The discrete state variables
represent jointly the discrete states of
and
where
,
,
, and
and
are learned according to the approach discussed in [
38], while
is the set that represents the dictionary consisting of all the possible joint discrete states (i.e., configurations) and
m is the total number of configurations. Therefore, by tracking the evolution of these configurations over time, it is possible to determine the transition matrix that quantifies the likelihood of transitioning from one configuration to the next, as defined by:
where
,
represents the transition probability from configuration
i to configuration
j and
.
The hidden continuous states
in the FP-M represent the dynamic interaction in terms of generalized relative distance consisting of relative distance and relative velocity, which is defined as:
The initialization is based on SM where the continuous latent state
represent a joint belief state where
and
denote the hidden generalized states (GSs) of
and
, respectively. The GSs consist of the vehicles’ position and velocity where
and
. The continuous variables
evolve from the previous state
by the linear state function
and by a Gaussian noise
, as follow:
where
is the state evolution matrix and
is the control unit vector.
Likewise, the observations in the FP-M depict the measured relative distance between the two vehicles defined as
, where
is the generalized observation, which is generated from the latent continuous states by a linear function
corrupted by Gaussian noise
as the following:
Since the observation transformation is linear, there exists the observation matrix
mapping hidden continues states to observations.
Consequently, within the FP framework, L can reproduce anticipated interactive maneuvers, serving as a benchmark to assess its interaction with . The concealed continuous states within the FP-M depict the dynamic interplay, characterized by a generalized relative distance that encompasses both relative distance and relative velocity.
3.2.2. Active Model
Active First-Person model (AFP-M) links the WM to the decision-making framework which is associated with
behavior. This connection is achieved by augmenting the FP-M with active states representing the
’s movements in the environment. Consequently, the AFP-M represents a Generative Model
as illustrated graphically in
Figure 4, which is conceptualized based on the principles of a partially observed Markov decision process (POMDP). The AFP-M encompasses joint probability distributions over observations, hidden environmental states at multiple levels, and actions executed by
, factorized as follows:
In the context of a POMDP:
often relies on observations, formulated as , to deduce actual environmental states that are not directly perceived.
forms beliefs about the hidden environmental states, represented as (,). These beliefs evolve according to and .
engages with its surroundings by choosing actions that minimize the abnormalities and prediction errors.
In the initial stage of the process (at
),
employs prior probability distributions, denoted as
and
, to predict environmental states. This prediction is realized through the expressions
and
. The methodological framework for the prediction is grounded in a sophisticated hybrid Bayesian filter, specifically the modified Markov jump particle filter (M-MJPF) [
39], which integrates the functionalities of both particle filter (PF) and Kalman filter (KF). As the process progresses beyond the initial stage (for
),
leverages the previously accumulated knowledge about the evolution of configurations. This knowledge is encapsulated in the probability distribution
, which is encoded in the transition matrix as outlined in (
2). The PF mechanism propagates
N particles, each assigned equal weight and derived from the importance density distribution
. This process results in the formation of a particle set, represented as
. Concurrently, a series of Kalman Filters (KFs) is utilized for each particle in the set, facilitating the prediction of the corresponding continuous GSs, denoted as
. The prediction of these GSs is directed by a higher-level, as indicated in Equation (
4), which can be articulated in probabilistic terms as
. The posterior distribution associated with these predicted GSs is characterized by the following description:
where
represents the diagnostic message that has been previously propagated, following the observation of
at time
. This mechanism plays a crucial role in the algorithm’s process: upon receiving a new observation
, a series of diagnostic messages are propagated in a bottom-up manner to update the
’s belief about the hidden states. Consequently, the updated belief in the GSs is expressed as
. In parallel, the belief in the discrete hidden states is refined by adjusting the weights of the particles, as denoted by
, where
is defined as a discrete probability distribution.
such that,
where
denotes the Bhattacharyya distance, a measure used to quantify the similarity between two probability distributions. The probability distribution
is assumed to follow a Gaussian distribution
. This Gaussian distribution is characterized by a mean vector and a covariance matrix as
.
The decision-making process of
hinges on its ability to decide between exploration and exploitation, depends on its interaction with the external environment. This discernment is predicated on the detection of observation anomalies. Specifically,
assesses its current state by analyzing the nature of its interactions. In scenarios where the observations align with familiar or normal patterns,
solely observes
. Conversely, in instances characterized by novel or abnormal observations,
encounters a more complex situation involving multiple agents (e.g., two dynamic agents
and
). This latter scenario represents a deviation from the experiences encapsulated in the expert demonstrations (see
Figure 5 and
Figure 6). Consequently, based on this assessment,
opts for action
, which is informed by its interpretation of environmental awareness and the perceived need for exploration or exploitation according to:
In (
10), under normal observation,
will imitate
’s action selected from the active inference table
defined as:
where
,
is the probability of selecting action
conditioned to be in configuration
,
is the set of available actions. In addition,
denotes the index of the particle with the maximum weight given by:
In (
10), if
encounters a situation that is abnormal and hasn’t been seen by
before, then
will look for new ways to act. It does this by calculating the Euclidean distance
, which is the shortest distance, between itself and
when they are in the same lane. Based on the measured distance,
adjusts its speed to ensure it doesn’t exceed the speed of
, helping to prevent a collision (like slowing down or braking).
The predictive messages, and ), propagated top-down the hierarchy. At the same time, the AFP-M receives sensory responses in the form of diagnostic messages, and , that move from the bottom level up of the hierarchy. Calculating multi-level Free Energy (FE) helps to understand how well the current observations match what the model predicts.
At the discrete level, FE is measured as the distinction between two types of messages,
and
, as they enter the node
. These messages are in the form of discrete probability distributions. Therefore, we propose using Kullback-Leibler Divergence (
) [
40] as a method to measure the probability distance and calculate the difference between these distributions.
At the continuous level, FE is conceptualized as the distance between different probabilistic messages arriving at node
. This involves the Bhattacharyya distance between the messages
and
, originating from the observation level, that is defined as follows:
where
is the Bhattacharyya coefficient.
Furthermore, Generalized Errors (GEs) facilitate understanding of how to suppress such abnormalities in the future. The GE associated with (
13) and conditioned upon transitioning from
is defined as:
where
represents an aleatory variable characterized by a discrete probability density function (pdf), denoted as
. The errors identified at the discrete level are then conveyed to the observation level. This process is essential for computing the generalized error at this level, represented as
, which explains the emergence of a new interaction within the surroundings.
By integrating the principles of adaptive learning and active inference, our objective is to minimize the occurrence of abnormalities. This goal can be achieved either by constructing a robust and reliable WM or by actively adapting to the dynamics of the environment. Such an approach ensures a comprehensive understanding and interaction with environmental variables, thereby enhancing the system’s predictive accuracy and responsiveness. The information acquired in the previous phases will be utilized to modify the beliefs of and to incrementally expand its knowledge regarding the environment. This process involves updating the active inference table and expanding the transition matrix . These updates also will take into account the parameter of abnormality observation for considering the similarity between the two configurations.
In situations involving abnormalities, incrementally encodes the novel experiences in WM by updating both the active inference matrix and the transition matrix. It’s important to note that during such abnormal situations, may encounter scenarios that involve configurations not previously experienced. These configurations are characterized by new relative distances between and other dynamic objects in its environment, differing from those configurations previously known by the entity . The discovery and understanding of these new configurations enable to learn and adapt, thereby enhancing its ability to respond to similar situations in the future.
Consequently, a set
consisting of the relative distance-action pair can be performed during the abnormal period
T (i.e., exploration) as
can be defined as
, where
n is the total number of the newly acquired configurations and
such that
. Therefore, the newly experienced action-configuration pairs characterized by
are encoded in
according to:
Similarly, by analyzing the dynamic evolution of these new configurations, it becomes possible to estimate their transition probabilities
encoded in
, which is defined as:
where
. Consequently, the updated global transition matrix
is expressed as:
where
is the original transition matrix and
is the newly acquired one.
evaluates performed action at time
using the FE calculated at time
k, as defined in (
13) and (
14). In abnormal conditions,
learns future behaviors by gathering information about its surrounding environment.
During the online learning procedure,
modifies/updates the current active inference table and transition matrix, which is based on diagnostic messages, represented by
and
. Additionally, the transition matrix is refined using the GE defined in (
15) as below:
The active inference table
can be adjusted according to:
where,
represents a specific row within
. Furthermore,
denotes the pdf of the GE associated with the active states, which can be calculated as the following:
where
.