Bayesian optimization is a powerful tool, which allows the user to develop a framework to efficiently solve learning problems [
15]. Satisfactory results can be obtained within even fewer than 100 iterations [
16,
17]. Furthermore, it tolerates stochastic noise in function evaluations and is best suited for optimization of system with small to medium sizes, typically less than 20 dimensions [
18]. Bayesian optimization is being continuously developed, for an overview to recent advances of the algorithm the readers can refer to [
19]. Due to the learning efficiency and noise toleration, it has great potential for industrial implementations, for example in process systems [
20,
21], positioning system [
22] and robotics [
23].
4.1. A Short Discussion on Model Learning for Control
With regard to the control of dynamic systems, Bayesian optimization has recently gained increased attention. A great part of it focus on controller parameter tuning [
24,
25]. In comparison, leveraging BO for model learning is not extensively researched [
26,
27]. Among the few publications, many of them are focusing on learning a residual model to complement the linear model in the model-based controller scheme, to achieve a better control quality. In [
28], the authors use GP to learn the relation between the adaptive term and modelling error in Model Reference Adaptive Control. And in [
29], GP is used for real time model adaptation, minimizing the error between prediction and measurement, as a straightforward extension of robust control. Similarly, GP is used to approximate the unknown part of the nonlinear model in [
27,
30]. According to [
31], it is often estimated that 75% of the cost associated to an advanced control project goes into system modeling. From the practical point of view, the proposed methods in literature does not significantly reduce the effort for modeling, since a static nominal model still needs to be identified. On the other hand however, additional effort should be made for the learning procedure. This fact could be a big hazard namely a motivation stopper for the industrial implementation of the proposed advanced controller optimization algorithm.
In general, the majority of the literature in this regard handles the modelling based on the belief of separation principle and certainty equivalence, in case when state estimation is involved. As stated in [
32], "a guiding principle should be to model as well as possible before any model or controller simplifications are made as this ensures the best statistical accuracy". So the general consent in the framework of data-driven control is that we also need a perfect model-plant match. There are only very few exceptions in literature that discuss the topic of direct performance-driven model learning under closed-loop conditions [
33], where model-plant mismatch could be possible, it is not necessarily examined. In this approach, no or very little prior knowledge with very few implementing cost is required, the model parameters are used directly and purely for the optimization of control performance.
Within the LQT control framework, it turns out that the controller with learnt model can provide significantly more stable solutions for some certain systems, than the controller which has the perfect knowledge of the system. An assumption is that inherently a model-plant mismatch is highly possible for the former case. This fact causes two concerns. The first concern is over-fitting. However, it is shown that with carefully designed experiments, one can learn a data-driven model for control which excites the dynamic spectrum of interest. This is a very useful insight. In fact, the authors encourage readers to use the system itself to (semi-)automatically generate the information of interest [
32]. A second concern is the stability margin. In the case of LQG controller, as famously stated in [
34], there is none for this class. It means that the stability margin is always system specific. In practical use, we carefully examine the robustness of the controller for individual cases. For CiL, the test scenarios can be categorized in several dynamic range and represented with certain test signals. In this case, the robustness will be examined directly in experimental setup. Examples for systematic study of robustness and integration into the algorithm for data-driven control can refer to [
24,
30,
35]. It is not in the scope of this paper. The intention of the authors is to showcase the possibilities opened up by this new approach, with focus on the industrial practicability.
4.2. Bayesian Optimization with Gaussian Process as Surrogate Model
As hinted in the previous section, in this contribution, we focus on using the BO to learn a system model dedicated for the control task. In other words, it is a performance-driven learning scheme [
36]. The learning procedure is summarized in Algorithm 1 and will be explained in detail in the following.
Essentially, BO is an efficient way to learn the relationship between the model parameter and the control performance J based on past observations. With random initial model parameters , the controller is instantiated and control sequences are applied to the dynamic system. We then evaluate the control performance with respect to input energy and tracking error by computing a predefined cost function. Then we obtain our first observation , with hinting the dependencies of the observation on the model parameters. For simplification of notation, in the following, when no misunderstanding occurs, will be simply note as . Notice also that without any suffix, the observation D represents the whole set of current observations, same rules apply to other mathematical notations.
We assume that the control performances are random variables that have a joint Gaussian distribution dependent on the model parameters. Without any observation, we define a GP prior that is a Gaussian distribution over function, which is completely parameterized by its mean and variance. Then, we can draw samples from it, which serve as candidate functions for what we are looking for. The actual observation
is a sample from the distribution of
. In Bayesian learning, we use the observations to re-weight these function candidates. The probability of a certain candidate function
from the prior is defined as
, the Bayes rule
scales this probability by a factor. The numerator of the scaling factor describes the likelihood of observations given the candidate function. It is normalized by the average likelihood, in other words, the overall probability of our observation over all possible candidate functions.
We are interested in finding the posterior
because we want to make predictions at the unobserved locations
, which in turn will be evaluated by the acquisition function for the decision of where the next iteration is going. We compute the posterior from the prior and the likelihood. By applying the Gaussian marginalization rule we obtain
Both
and observations
D are Gaussian, by unfolding the definition of covariance and linearity of expectation, their joint distribution can be formulated as follows:
By applying the conditional rule for multivariate Gaussian, the distribution for the posterior is obtained:
with the posterior mean
:
and posterior covariance
The steps of (
23), (
25) and (
26) are a recurring pattern in Bayesian learning and we use this to compute the posterior for Gaussian process model. By assuming zero mean and expanding (
25) we have
The expanded mean (
27) reveals that for an arbitrary test location
, the posterior is a weighted sum of all the observations
D normalized by
. The weights are defined by the kernel between this test location
and all training locations in
D. A typical kernel function is the squared exponential kernel with the following structure:
with
l being the length scale, which characterize how correlated the random variables are depending on the distance, and
the signal variance, which reflects the range of function values.
Finally, we model the observations as a linear transformation from the latent variable f with added Gaussian noise. The questions arises during this process are how to select the test points for prediction from infinite possibilities in the parameter space and how BO decides the next iteration point, leveraging the information gathered from the GP. To the first question, there are many different search methods introduced in the literature, e.g. "local random search". For the second question, typically, BO decides the next sample location by optimizing the so-called acquisition function , e.g. Expected Improvement (EI), with the use of the mean and variance predictions of the GP. It selects the next parameter set where the expected improvement over the target minimum among all the explored data is maximal. In this way, the model instance with this new optimal parameter set - up to the current iteration - is used for the controller in the subsequent step.
It is important to note that the decision on the next model parameter set is an internal loop of the BO framework, which does not require experiments on the real system. Therefore, the approach is extremely data efficient. To speed up the algorithm’s convergence, one can also set constraints on the search space of the model parameters based on prior knowledge to the system. For example, the signs of certain parameters can be predetermined, when their meaning can be physically interpreted.