3.1. MPC problem formulation
The Model Predictive Control problem is, in its essence, a recurring optimization problem that searches for the optimal control input over the receding control horizon, and in most cases, solves as well for the state variables over the prediction horizon. Only the first control input in the control horizon is realized by the actuators. In the context of our problem, attitude dynamics are coupled with the ROE-based relative translational motion, and the state vector is set to,
where the dynamics of the state vector are described by (
3) and (
10). The control input, on the other hand, is set to
where
u is the magnitude of the piece-wise constant input acceleration provided by the single thruster, and
is the desired quaternion. The state vector as well as the control vector are collated over the prediction and the control horizons in the two matrices
and
respectively, such that,
with
being the length of the prediction horizon, and
being the length of the control horizon, where
. It is to be noted that the notations
where
and
where
are used, and the time difference between any two consecutive instances is constant and is equal to the user-defined sampling time, i.e.
. Moreover, after the control horizon is over, the input acceleration is set to zero (i.e.
) and the desired attitude is left to be decided by the mission-specific mode management logic of the ADCS. For the sake of simulating the attitude evolution, the reference attitude at an arbitrary time step beyond the control horizon is set to the quaternion from the previous step. Formally,
.
It is also important to note that the adopted satellite is assumed, without loss of generality of the approach, to have its thruster directed along the z-direction of its body frame, and hence the input acceleration vector in
and in
can be related as,
After having introduced the necessary notations, the optimization problem could then be formulated as,
where , , and , are the time, dimensionless ROE vector, and quaternion vector respectively at the beginning of each horizon. Note that these are receding quantities that are specific to each prediction horizon. Accordingly, they coincide to the quantities at the beginning of the simulation only once, at the time of the first prediction. Moreover, is the maximum applicable acceleration that can be provided by the thruster, is the angular velocity vector of the satellite, and is the maximum allowable angular rate around any axis.
Indeed, a constant value for the input acceleration vector,
, has to be fed to the ROE dynamics constraint (
16). However, the attitude dynamics are much faster than that of the ROE, and since the vector
is attitude-dependent, its value can change significantly from
to
even when
is kept constant. Therefore, a prediction of
,
, had to be fixed and fed to the dynamics constraint (
16). The value of
is set to that of
at time
, with its formal definition being as follows,
The attitude kinematics (
10) are discretized through a symbolic Runge-Kutta
-order scheme and the discrete kinematics are imposed as constraint (
17) in Problem 1. Moreover, the discrete angular velocity vector in constraint (
19) in Problem 1,
, could be obtained by applying formula (
9) using the discrete unit quaternion signal,
. It is also worth noting that the squared norms (
,
, and
) are constrained instead of the norms themselves. This formulation is adopted to facilitate the job of the optimizer since it allows the Jacobian of the constraints to be defined at all values of
and
.
The cost function components of Problem 1,
,
, and
are defined as,
where
is the reference ROE vector, while
,
,
, and
are positive-definite MPC gains.
It is important to note that the ultimate goal of autonomous orbit keeping is to rendezvous with a virtual target then track its absolute orbit, i.e.
. Moreover, the matrices
and
are chosen to be diagonal matrices, which renders
a summation over the squared weighted 2-norms of the error vector,
. This ultimately means that
is a measure of distance between
and
, which is zero in our case, in a scaled-ROE space. Indeed, reasoning
in terms of the magnitude of the error vector in a scaled-ROE space does not only allow to eventually drive
to
, and hence for the satellite to rendezvous and then track the orbit of the virtual target, but also allows to indirectly minimize the total delta-V cost, since the distance in the ROE space, or in a scaled-ROE space for that matter (e.g. the dimensional ROE space, which scales the ROE vector by the mean semi-major axis of the chief), can be used as a measure for the total delta-V cost [
12,
13]. A closer look at
reveals that while minimizing it indeed drives the error signal to zero, it is, in fact, an implicit trade-off between fuel and time optimality, depending on the ratio between the traces of
and
respectively. The greater the value of this ratio, the more inclined towards fuel-optimality the cost-function becomes, while the lower the ratio, the more the scheme leans towards time-optimality.
Adding
to the compound cost function is set to directly minimize the total required thrust from the onboard single thruster. The adopted convex form of
is the fuel-optimal form of a cost function, which also coincides with the delta-V-optimal form for single-thruster satellites [
20].
Finally, the purpose of adding the last component of the cost function, , is to softly constraint the desired attitude to be as close as possible to the actual attitude in order to avoid unnecessary large slews, which in-turn minimizes the attitude control effort. This soft constraint allows large slew maneuvers, depending on the choice of the MPC gains, only when it is meaningful from the ROE error (i.e. ) or fuel cost (i.e. ) points of view.
3.2. Parameters tuning
In the proposed formulation of MPC there are many parameters and gains to be chosen, which can affect the overall performance and robustness of the MPC. In many industrial applications the choice of such parameters is done by engineering experience. In this application, the initial guess for the prediction horizon has been formulated based on the literature review. Since the
-optimal locations for ROE changes through impulsive delta-V increments are known to occur, at most, every half-orbit [
13], and since the input acceleration of the low-thrust propulsion system should be distributed around these
-optimal locations (if
optimality is sought), the prediction horizon,
, is chosen to be at least half the orbital period. In this setting, the MPC is able to foresee all the
-optimal locations throughout a certain orbit, which hence leads to optimal allocation of the available thrust. Overly increasing the prediction horizon is expected to only overwhelm the onboard processor, while not having much effect on the optimality of the solution.
The choice of the control horizon,
, on the other hand, is done by trial and error, which is an iterative process as all the other parameters have to be fixed before the tuning process starts. Nonetheless, regarding the choice of the
value, it has to be a small positive integer less than or equal to
. Common guidelines for choosing the prediction and control horizons could be found in [
21].
Tuning of the sampling time,
, has also been performed by trial and error. In the context of Problem 1, although the dynamics of the ROE are slow and the sampling time could be chosen a large value in order to not solve the optimization problem so frequently, the following important consideration has to be taken into account to carry out the tuning process. Since the attitude dynamics are much faster than that of the ROE, one could not ignore that a fixed value for the input acceleration vector in the RTN frame,
, is fed to the linearized ROE dynamics constraint (
16), see (
22), which renders the model used in the MPC not accurate unless a small value of the sampling time is chosen. Indeed, Including
in the compound cost function helps in slowing down the attitude dynamics. Still, the need to choose a scant value for the sampling time is meaningful. A compromise had to be done between choosing a small
and have a more accurate model for the MPC on one hand, and choosing a large
which allows the optimization problem to be solved less frequently on the other.
The cost function gains (i.e. , , , and in this paper) are more challenging to be tuned, since the cost function of the MPC is problem dependent and no general guidelines exist for its weighting. Thus, more focus is put on analyzing the effect of choosing different cost function weights and on actually selecting an optimal set of these weights.
In order to reduce the number of tunable parameters, the component of the cost function, i.e.
,
, and
, are related to the main weighting matrix,
, such that,
where
with
being the expected value of
over the prediction horizon, which is approximated by the ROE vector at the beginning of each horizon, and
and
are predictions for the values of
u and
over the control horizon. The approximated expected values,
,
, and
, are illustrated graphically over one prediction horizon in
Figure 3. For imaging purposes, each of them is taken one dimensional. Indeed, the values assumed for
and
are merely the authors’ predictions for the expected values of these quantities over the whole simulation, however, choosing other values for these variables shall not affect the final MPC gains as will be elaborated upon later in the text.
Substituting from (
25) into (
24), the MPC gains can be related to
,
,
, and
as,
where
and
change for every prediction horizon.
It is conceivable that when
is very small (i.e. the rendezvous of the satellite with its virtual target has already taken place), the priority should be given to
in order for the ADCS not to hyper-react to the small errors in
. Indeed, the phenomenon of hyper-reaction of the ADCS when
gets too small has been verified by numerical simulation for a system which is using the definition of
in (
26). It is for this reason that
had to be redefined in order to steer the priority to
when
is approaching zero, such that,
where
is the saturation function which is defined as follows,
Looking at equations (
24), (
25), (
26), and (
27), it is clear that in order to define all the MPC gains, one has to choose
,
,
,
, and
.
The choice of
is rather simple since all the ROE can be weighted equally. However, in order to enhance the stability of the final relative orbit (i.e. to keep the obtained orbit for as long as possible without being much affected by the perturbations), an emphasis is put on minimizing
when the ROE are close to the target ones, and
is defined as,
where
Q is a tuning parameter that controls the order of magnitude of the cost function, and
is the identity matrix with dimensions
.
Fixing the value of
Q to
, of
to 10, and of
to
, a brute force sensitivity analysis over
and
is carried out to choose adroit values for
and
. Thanks to the High Performance Computer of the University of Luxembourg [
22], 726 simulations could be run simultaneously where in addition to interchanging the values of
and
, also the seed of the Random Number Generator (RNG), which controls the initial state at the beginning of the simulation, is taken into account. In these simulations, and in the sequel, warm-starting is employed at the beginning of each prediction horizon, meaning that the optimized state and control profiles from the previous prediction horizon are utilized to construct the initial guess for the current one.
Before starting the simulations, it has been noted that the admissible values for
and
belong to the period
since the most important component of the cost function is
and since the two other components are normalized with respect to
(see (
24)).
One simulation would have its RNG seed,
, and
as one possible combination from the following sets,
To this end, a fitness function needs to be introduced in order to asses how good the output of a simulation is. Such fitness function in our context needed to address four criteria (performance metrics),
Driving the ROE vector to zero, which is the main goal of the MPC scheme to achieve orbit keeping. This is assessed through observing the norm of the finale of the ROE time series, denoted as .
Enhancing the stability of the relative orbit finale which, in other words, is minimizing the Root-Mean-Square (RMS) of the last portion of the relative semi-major axis profile over time, denoted as .
Minimizing the total delta-V, .
Reducing the attitude effort, which is quantified through the mean angular rate of the satellite, .
It is necessary to state that the term "
finale" in this paper refers to the last
of the simulation time span. Furthermore, although this research is concerned with low-thrust propellers, the total delta-V is considered instead of the total thrust in order to enable comparisons between the proposed scheme and those from the literature. The total delta-V in our context can be calculated using the following formula,
where
and
are the initial and final times of the simulation respectively. Given that the input thrust/acceleration from the single thruster is piece-wise constant, the total delta-V can be rewritten as follows,
with
being the number of receding horizons being processed within a simulation,
j is the horizon index, and
being the fixed sampling time.
The adopted overall fitness function could be then written as,
where
,
,
, and
are weights that determine the importance of each of the four criterion, and the function
is defined as,
with
and
being the minimum and maximum functions over all the simulations. It is clear the this form of the fitness function only shoots out values between 0 and 1.
Running the simulations for the 726 combinations in (
30) for a fixed chief’s initial orbit, defined in
Table 1, applying the definition of the fitness function in (
34) to all the 726 simulations and to the four performance metrics, and fixing the values of the weights to
,
,
, and
, which reflects the high importance of minimizing
in comparison to the other metrics, the heat maps which summarize all the simulations could be obtained and are presented in
Figure 4.
The heat maps in
Figure 4 could only be obtained after averaging the fitness over each RNG seed in (
30), and after adopting a scattered data interpolant in order to smoothen the heat maps.
It is clear from the figures that the smaller is, the better and become, while this relation is conceivably inverted for . Furthermore, and could be noticed to be positively correlated, and are both negatively correlated with and . Further investigations on the simulations where and are both large revealed that minimal input acceleration is provided at the current attitude of the satellite which is almost stagnant, while the effect of this acceleration on the orbit correction is, as well, minimal. Nonetheless, the ROE vector is approaching its set point although very slowly.
The overall fitness function,
, is depicted in
Figure 5 and the fittest point, i.e. the point with maximum overall fitness, (
and
) is marked with the gray circle on the heat map. It is clear from the fittest
-
combination that, for the initial chief orbit in
Table 1, it was not necessary to include the direct delta-V penalty,
, in the cost function, since
was already indirectly accounting for delta-V cost as previously mentioned. One has to acknowledge that the fittest combination of
and
in
Figure 5 is not necessarily universal for any arbitrary initial chief’s orbit, and the selection of the optimal
-
combination needs to be redone for each initial orbit of the virtual chief. This does not represent a limitation of the approach, since the reference orbit of the chief is fixed once as soon as the scientific mission is designed.
The employed approach served the need to identify an adroit set of combination of and . Future investigations may focus on the set up of a heuristic approach, e.g. genetic algorithms, to search for the combination that globally maximizes the overall fitness function. In this context, note that the definition of the fitness functions and of the related parameters impact the global optimum.
Finally, the values of the optimized
and
are heavily dependant on the authors’ prediction for
and
, and another choice of these quantities would have definitely led to different optimal
and
, however, the values of
and
will not change since
is directly divided by
to obtain
and the same dynamics apply for
with the squared 2-norm of
to obtain
( see (
26)).
3.3. Closing the control loop
The MPC described so far is one key module of the whole control loop, which, at practical implementation, requires feedback to operate properly, that is provided by the navigation system, which in-turn relies on sensors as well as a state estimation filter. The full closed loop, which illustrates the module execution logic onboard of the deputy spacecraft, is depicted in
Figure 6. There, Osc2Mean is the function that transforms osculating orbital elements to mean ones [
15], Body2Inert is the method that rotates any vector from the body frame to the inertial frame, and OE2ROE is the function that transforms the absolute orbital elements of both the chief and the deputy to a ROE vector (see (
2)). The breve accent,
, signifies a quantity which is affected by either one or a combination of the following sources of errors:
Estimation errors (e.g. )
Actuator errors (e.g. )
Physical constraints (e.g. )
The proposed MPC is validated by meas of numerical simulations, emulating the performance of the navigation module and the effects of the physical limitations of the adopted actuators.
The physical limitations are only present in the saturation block (see
Figure 6) which could be implemented in the numerical simulations. In fact, this saturation is taken into account in the MPC implementation, but the saturation block after the MPC is implemented anyway as a safeguard in case the MPC computes an infeasible solution.
The "Sensors" and "Filter" blocks in
Figure 6 are replaced by the following surrogate models which treat the propagated state of the deputy as ground truth,
where "Cart2OE" is the method that transforms the Cartesian state to orbital elements, and
is a normally distributed random variable with
as its expected value and
as its variance. Hence,
is the covariance matrix of the random noise affecting the estimation of the Cartesian state of the deputy satellite, which is defined as,
where
and
are the variances of the 1-dimensional position and velocity estimation errors respectively, which, for the numerical simulations presented in this work, is extracted from the publicly available Triton-X brochure.
It is to be noted that the relative navigation is typically more accurate than the absolute one [
23,
24]. It is for this reason that a similar model is used to emulate the behaviour of the relative navigation system within the numerical simulations as follows,
where
is the covariance matrix of the random noise affecting the estimation of the ROE vector, which is expanded as,
with the diagonal elements being the variances of the random noise that affect each element of the ROE vector. These variances are taken to be all equal in the sequel, i.e.
.
Lastly, the "Pointing error" block is replaced by the following surrogate model,
where
is the pointing error angle, which is obtained from Triton-X brochure in the validation simulations, and
is a random 3-element unit vector.
It is important to note that although the chief, in this context, is a virtual point, it is propagated using the perturbed dynamics for two main reasons,
in many applications the tracking of the reference orbit and the absolute control of the reference orbit are handled separately;
the proposed scheme can be directly applied to the rendezvous to an actual spacecraft.
As a result, the autonomous orbit keeping compensates errors and perturbations acting on the relative dynamics.