RBF Neural Networks-Based Near-Optimal Tracking Control of Partially Unknown Discrete-Time Nonlinear Systems

Preprint

Article

RBF Neural Networks-Based Near-Optimal Tracking Control of Partially Unknown Discrete-Time Nonlinear Systems

Altmetrics

Downloads

142

Views

Comments

A peer-reviewed article of this preprint also exists.

This version is not peer-reviewed

Submitted:

29 February 2024

Posted:

01 March 2024

You are already at the latest version

Alerts

Abstract

This paper proposed an optimal tracking control scheme through adaptive dynamic programming(ADP) for a class of partially unknown discrete-time nonlinear systems based on radial basis function neural network(RBF-NN). In order to acquire the unknown system dynamics, we use two RBF-NNs, the one is used to construct the identifier, and the another is used to directly approximate the steady-state control input, where a novel adaptive law is proposed to update neural network weights. While the optimal feedback control and the cost function are derived via feedforward neural networks approximating, it is proposed to regulate the tracking error, the critic network and the actor network are then trained online to obtain the solution of the associated Hamilton–Jacobi–Bellman (HJB) equation being built under the ADP framework. Simulations verify the effectiveness of the optimal tracking control technique using the neural networks.

Keywords:

Subject: Computer Science and Mathematics - Mathematics

1. Introduction

As is widely known, nonlinear system control is an important topic of control fields, especially for uncertainly unknown nonlinear systems, which is difficult for traditional control methods. Until 1988, radial basis function neural networks were proposed [1]. Immediately following in 1900, Narendra, K. S. and K. Parthasarathy first proposed an artificial neural network adaptive control method for nonlinear dynamical systems [2]. Since then, multilayer neural networks (MNN) and radial basis function (RBF) neural networks were successfully applied in pattern recognition and control systems [3]. Compared to multilayer feedforward networks (MFNs), the RBF neural networks attracted much attention due to their good generalization ability, simple network structure, and avoidance of unnecessary and lengthy computations. Studies on RBF-NNs have shown the ability of neural networks to approximate any nonlinear function with a compact set and arbitrary accuracy[4,5]. Many research results have been published on neural network control for nonlinear systems [6,7].

On the other hand, optimal tracking control as one of the effective methods for nonlinear systems in optimal control, received many practical engineering applications [8,9,10]. Therefore, exploring the optimal tracking optimal control for nonlinear systems possesses significant theoretical importance and practical value. For optimal control methods for nonlinear systems, the difficulty lies in the requirement of solving the nonlinear Hamilton-Jacobi-Belman (HJB) equation, which is usually difficult to solve analytically. Although dynamic programming is an effective method for solving optimal control problems, there is the problem of "curse of dimensionality" when facing relatively complex systems [11,12].

Faced with the difficult problem of solving nonlinear Hamilton-Jacobi-Bellman partial differential equations exactly, several methods was proposed to approximate the solutions of the Hamilton-Jacobi-Bellman equations, These include the use of reinforcement learning [8,13,14,15,16,17,18,19] and back-propagation through time [20]. Among these classical RL methods, combining the advantages of adaptive control and optimal control, the ADP algorithm was considered as one of the core methods for realizing optimal control strategies for the diversity of optimal control problems, and it has been successfully applied to both continuous-time systems [21,22,23] and discrete-time systems [24,25,26,27,28] to search for solutions of the HJB equations online. Numerous ADP and RL approaches emerged, such as robust ADP [29,30] iterative/invariant ADP [31,32,33], spiking/Hamiltionian-driven ADP [34,35], integral RL [36,37], and off-policy RL [38,39,40]. Several works have attempted to solve the discrete time nonlinear optimal regulation problem in a near optimal sense using adaptive dynamic programming through neural networks (NNs) with offline training.

In the past decades, many relevant studies was conducted on the optimal tracking control of discrete-time nonlinear system, such as generalised policy iteration adaptive dynamic programming[41], actor-critic algorithm [42], heuristic dynamic programming (HDP)[43], greedy heuristic dynamic programming iteration algorithm[44] and Q-Learning Algorithm[45]. However, in the known literatures, optimal tracking control methods using RBF-NNs applied to the ADP algorithm are barely used.

In this paper, an optimal tracking control method RBF-NNs-based for discrete-time partially unknown nonlinear systems is proposed, two RBF neural networks are used to approximate the unknown system dynamic as well as the steady-state control, and after transforming the tracking problem into a regulation problem, two feedforward neural networks are used to approximate the critic network and the actor network to obtain the error feedback control, which allows the online learning process to require only current and past system data rather than the exact system dynamics.

The contributions of article are as follows: (1) Unlike classical technique of NNs approximating [42,44,45,46], we propose an near-optimal tracking control scheme for a class of partially unknown discrete-time nonlinear systems based on RBF-NNs and the stability of systems is proved by the Lyapunov theory. (2) Compared with [41,44], we additionally used an RBF-NN to directly approximate the the steady-state controller of the unknown system, it can solve the requirement for the priori knowledge of the controlled system dynamics and reference system dynamics. (3) For the inverse dynamic NN to directly approximate the steady-state controller of the system, we propose an novel adaptive law to update the weight of the RBF-NN, and the convergence is completed through the selection of constants.

The organization of this paper is as follows. The problem statement is shown in Section II. The technique of the system with partially unknown nonlinear dynamic are designed in Section III, where include the RBF-NN identifier, the RBF-NN steady-state controller, near optimal feedback controller and stability analysis. Simulations and experimental results are provided in Sections IV to validate the proposed control method. Section V draws some conclusions.

2. Problem Statement

In this paper, we consider the following discrete-time nonlinear system:

x (k + 1) = f [x (k)] + g [x (k)] u (k)

(1)

where

x (k) \in R^{n}

is the measurable system state and

u (k) \in R^{m}

is the control input. Assume that the nonlinear smooth function

f [x (k)] \in R^{n}

is an unknown drift function,

g [x (k)] \in R^{n × m}

is a known function and

{∥ g [x (k)] ∥}_{F} \leq g_{1}

where the Frobenius norm

{∥ \cdot ∥}_{F}

is applied. In addition, assuming that there exists a matrix

{g [x (k)]}^{+} \in R^{m × n}

such that

g [x (k)] {g [x (k)]}^{+} = I \in R^{n × n}

where I is the identity matrix. Let

x (0)

be the initial state.

The reference trajectory is generated by the following bounded command:

x_{d} (k + 1) = φ (x_{d} (k))

(2)

where

x_{d} (k)

\in R^{n}

and

φ (x_{d} (k)) \in R^{n}

, and

x_{d} (k)

is the reference trajectory, which needs only to be stable in the sense of Lyapunov, not necessarily asymptotically stable.

Let

u (k)

be an arbitrary sequence of controls from k to infinity. The goal of this paper is to design a controller

u (k)

that not only ensures the state of system (1) tracks the reference trajectory, but also minimizes the cost function

\begin{matrix} J (e (k), u (k)) = \sum_{k = 0}^{\infty} e^{T} (k) Q e (k) + u^{T} (k) R u (k) \end{matrix}

(3)

where

Q \in R^{n × n}

and

R \in R^{m × m}

are symmetric positive definite;

e (k) = x (k) - x_{d} (k)

is tracking error. For common solutions of tracking problems [47], the control input consists of two parts, a steady-state input

u_{d}

and a feedback input

u_{e}

. Next, we will discuss how obtain each part.

The steady-state of the control input is used to ensure perfect tracking. This perfect tracking equation is realized under the condition

x (k) = x_{d} (k)

. For this condition to be fulfilled, the steady-state part of the control

u_{d} (k)

must exist to make

x (k)

equivalent

x_{d} (k)

. By substituting

x_{d} (k)

and

u_{d} (k)

into system (1), the reference state is

x_{d} (k + 1) = f [x_{d} (k)] + g [x_{d} (k)] u_{d} (k)

(4)

If the system dynamics (1) are known,

u_{d} (k)

is acquired by

u_{d} (k) = g {[x_{d} (k)]}^{+} (x_{d} (k + 1) - f [x_{d} (k)])

(5)

where

g {[x_{d} (k)]}^{+} = {(g {[x_{d} (k)]}^{T} g [x_{d} (k)])}^{- 1} g {[x_{d} (k)]}^{T}

is the generalized inverse of

g [x_{d} (k)]

with

g {[x_{d} (k)]}^{+} g [x_{d} (k)] = I

By using (1) and (4), the tracking error dynamics

e (k)

are given by

\begin{matrix} e (k + 1) & = f [x (k)] + g [x (k)] u (k) - x_{d} (k + 1) \\ = f_{e} (k) + g_{e} (k) u_{e} (k) \end{matrix}

(6)

where

f_{e} (k) = g (e (k) + x_{d} (k)) g {(x_{d} (k))}^{+} (φ (x_{d} (k)) - f (x_{d} (k))) + f (e (k) + x_{d} (k)) - φ (x_{d} (k))

u_{e} (k) = u (k) - u_{d} (k)

and

g_{e} (k) = g [x_{d} (k) + e (k)]

u_{e} (k)

\in R^{m}

is the feedback control input. By minimizing the cost function, it is designed to stabilize the tracking error dynamics. For

e (k)

under the control sequence, the cost function is defined as

\begin{matrix} J_{e} (e (k), u_{e} (k)) & = \sum_{k = 0}^{\infty} e^{T} (k) Q e (k) + u_{e}^{T} (k) R u_{e} (k) \\ = e^{T} (k) Q e (k) + u_{e}^{T} (k) R u_{e} (k) + J_{e} (e (k + 1), u_{e} (k + 1)) \\ = r (k) + J_{e} (e (k + 1), u_{e} (k + 1)) \end{matrix}

(7)

where

r (k) = e^{T} (k) Q e (k) + u_{e}^{T} (k) R u_{e} (k)

, and

J_{e} (e (k), u_{e} (k)) > 0

for

\forall e (k), u_{e} (k) \neq 0

Q \in R^{n × n}

and

R \in R^{m × m}

are symmetric positive definite,

x_{d} (k)

and

x_{d} (k + 1)

are bounded to be tracked by the reference trajectory. The tracking error

e (k)

is used in this study of the cost function of the optimal tracking control problem. This feedback control

u_{e} (k)

is found by minimizing (7) to solve the extremum condition in the optimal control framework[8]. This result is

u_{e}^{*} (k) = - \frac{1}{2} R^{- 1} g_{e} (k) \frac{\partial J (e (k + 1))}{\partial e (k + 1)}

(8)

Then the standard control input is obtained

u^{*} (k) = u_{d} (k) + u_{e}^{*} (k)

(9)

where

u_{d} (k)

is obtained from (5),

u_{e}^{*} (k)

is obtained from (8).

Remark 1.

In oreder to acquire the unknown dynamics information in system (1), we used an RBF neural network to reconstruct system dynamics. Therefore, we can use (5) to obtain the steady-state control

u_{d} (k)

The main results of this paper are based on the following definitions and assumptions.

Assumption A1.

System (1) is controllable, and the system state

x (k) = 0

is in equilibrium under control

u (k) = 0

. Input control

u (k) = u (x (k))

satisfies

u (x (k)) = 0

for

x (k) = 0

, and cost function is a positive definite function for any

x (k)

and

u (k)

Definition 1.

A control law

u_{e}

is admissible with respect to (7) on the set Ω , if

u_{e}

is continuous on a compact set

Ω_{u} \in R

for

\forall e (k) \in Ω

u_{e} (0) = 0

, and

J (e (0), u_{e} (\cdot))

is finite.

Lemma 1.

For the tracking error system (6), assume that

u_{e} (k)

be an admissible control and the internal dynamics

f_{e} (k)

is bounded, and

\begin{matrix} ∥ f_{e} {(k) ∥}^{2} \leq & Γ λ_{min} (Q) {∥ e (k) ∥}^{2} / 2 \\ + (Γ λ_{min} (R) - 2 g_{1}^{2}) {∥ u_{e} (k) ∥}^{2} / 2, \end{matrix}

(10)

where

λ_{min} (R)

is the minimum eigenvalue of R,

λ_{min} (Q)

is the minimum eigenvalue of Q, and

Γ > 2 g_{1}^{2} / λ_{min} (R)

is a known positive constant. Then, the tracking error system (6) be asymptotically stable.

Proof.

Considering the following positive definite Lyapunov function,

V (k) = e^{T} (k) e (k) + Γ J_{e} (k)

(11)

where

J_{e} (k) = J_{e} (e (k), u_{e} (k))

is defined in (7). Differencing the Lyapunov function yields

Δ V (k) = e^{T} (k + 1) e (k + 1) - e^{T} (k) e (k) + Γ (J_{e} (k + 1) - J_{e} (k))

(12)

Using (6) and (7), we can obtain

\begin{matrix} Δ V (k) = & {(f_{e} (k) + g_{e} (k) u_{e} (k))}^{T} (f_{e} (k) + g_{e} (k) u_{e} (k)) \\ - e^{T} (k) e (k) - Γ (e^{T} (k) Q e (k) + u_{e}^{T} (k) R u_{e} (k)) \end{matrix}

(13)

Applying the Cauchy–Schwarz inequality yields

\begin{matrix} Δ V (k) \leq 2 ∥ f_{e} {(k) ∥}^{2} - (Γ λ_{\min} (R) - 2 g_{1}^{2}) {∥ u_{e} (k) ∥}^{2} \\ - Γ λ_{\min} {(Q) ∥ e (k) ∥}^{2} - {∥ e (k) ∥}^{2} \end{matrix}

(14)

Considering the goal of the tracking error system (6) being asymptotically stable, i.e.,

Δ V (k) < 0

, we require

\begin{matrix} 2 ∥ f_{e} {(k) ∥}^{2} \leq & Γ λ_{min} (Q) {∥ e (k) ∥}^{2} \\ + (Γ λ_{min} (R) - 2 g_{1}^{2}) {∥ u_{e} (k) ∥}^{2} \end{matrix}

(15)

Therefore, if the bound in (10) is satisfied, we can get

Δ V (k) < 0

and the asymptotic stability of the tracking error system (6) is proved. □

Remark 2.

Lemma 1 shows that under the condition that the internal dynamics

f_{e} (k)

is bounded is bounded to satisfy (10), then, for the nonlinear system (6), there exists an admissible control

u_{e} (k)

not only stabilizes the system (6) on Ω but also guarantees that

J_{e} (k)

is finite.

3. Optimal Tracking Controller Design with Partially Unknown Dynamic

In this section, firstly, we use an RBF-NN to approximate the unknown system dynamics

f [x (k)]

, and use another RBF-NN to approximate the steady-state controller

u_{d} (k)

. Secondly, two feedback neural networks are introduced to approximate the cost function and the optimal feedback control

u_{e} (k)

. Finally, the system stability is proved by selecting an appropriate Lyapunov function.

3.1. RBF-NN Identifier Design

In this subsection, in order to capture the unknown dynamics of the system (1), an RBF-NN-based identifier is proposed. Without losses of generality, this unknown dynamics is assumed to be a smooth function within a compact set. Then this unknown dynamics (1) can be approximated by the RBF-NN as

\hat{f} (x (k)) = \hat{w_{f}} {(k)}^{T} h [x (k)] + ▵_{f} (x)

(16)

where

\hat{w_{f}} (k)

is the matrix of ideal output weights of the neural network and

h [x (k)]

is the vector of radial basis functions,

▵_{f} (x)

is the bounded approximation error,

| | Δ_{f} (x) | | < ε_{f}

, where

ε_{f}

is a positive constant.

For any non-zero approximation error

▵_{f} (x)

, there exists optimal weight matrix

{w_{f}}^{*}

such that

f (x (k)) = \hat{f} (x, w_{f}^{*}) - ▵_{f} (x)

(17)

where

w_{f}^{*}

is the optimal weight of identifier, and

\hat{f} (x, w_{f}^{*}) = w_{f}^{*} {(k)}^{T} h [x (k)]

. The output weights are updated and the hidden weights remain unchanged when training, so the neural network model identification error is

\begin{matrix} \tilde{f} (x (k)) & = f [x (k)] - \hat{f} [x (k)] \\ = \hat{f} (x, {w_{f}}^{*}) - ▵_{f} [x (k)] - \hat{w_{f}} {(k)}^{T} h [x (k)] \\ = - \tilde{w_{f}} {(k)}^{T} h [x (k)] - ▵_{f} [x (k)] \end{matrix}

(18)

where

- \tilde{w_{f}} (k) = {w_{f}}^{*} (k) - \hat{w_{f}} (k)

The weights are adjusted to minimize the following error

E (k + 1) = \frac{1}{2} {[\tilde{f} (x (k))]}^{T} [\tilde{f} (x (k))]

(19)

Using gradient descent method, the weights are updated by

\begin{matrix} Δ w_{f_{j}} (k + 1) & = - η \frac{\partial E}{\partial w_{f_{j}}} \\ = η (f (x (k)) - \hat{f} (x (k))) h [x (k)] \\ = η (\tilde{f} (x (k))) h [x (k)] \end{matrix}

(20)

and

w_{f_{j}} (k) = w_{f_{j}} (k - 1) + Δ w_{f_{j}} (k)

(21)

where

η > 0

is the learning rate of the identifier.

Assumption A2.

The error of the neural network approximation is assumed to have an upper bound, namely

▵_{f} {(x)}^{T} ▵_{f} (x) \leq \tilde{w_{f}} {(k)}^{T} \tilde{w_{f}} (k) h {[x (k)]}^{T} h [x (k)

(22)

3.2. RBF-NN Steady-State Controller Design

We use the RBF-NN to approximate the steady-state control

u_{d} (k)

directly, the inverse dynamic NN is established to approximate[48,49].

We design the steady-state control

u_{d} (k)

through the approximation of the RBF-NN

u_{d} (k) = {\hat{w_{d}}}^{T} (k) h [x_{d} (k)]

(23)

where

\hat{w_{d}}

is the actual neural network weights;

h [x_{d} (k)]

is the output of the hidden layers;

u_{d} (k)

is the output of the RBF-NN.

Let the ideal steady-state control

u_{d}^{*} (k)

u_{d}^{*} (k) = w_{d}^{* T} h [x_{d} (k)] + ε_{u}

(24)

where

w_{d}^{*}

is the optimal neural network weights and

ε_{u}

is the error vector. Assuming that

x_{d} (k + 1)

is the desired output of the system at the point

k + 1

, without considering external disturbances, the control input

u_{d}^{*} (k)

satisfies

L [x_{d} (k), u_{d}^{*} (k)] - x_{d} (k + 1) = 0

(25)

where

L [x_{d} (k), u_{d}^{*} (k)] = f [x_{d} (k)] + g [x_{d} (k)] u_{d}^{*} (k)

Thus, we can define the error

e_{m} (k)

of the approximating state as

e_{m} (k + 1) = L [x_{d} (k), u_{d} (k)] - x_{d} (k + 1)

(26)

where

L [x_{d} (k), u_{d} (k)] = f [x_{d} (k)] + g [x_{d} (k)] u_{d} (k)

(24) subtracted from (23) yields

\begin{matrix} u_{d} (k) - u_{d}^{*} (k) = {\hat{w_{d}}}^{⊤} (k) h [x_{d} (k)] - w_{d}^{* ⊤} (k) h [x_{d} (k)] - ε_{u} \\ = {\tilde{w_{d}}}^{T} (k) h [x_{d} (k)] - ε_{u} \end{matrix}

(27)

where

\tilde{w_{d}} (k) = \hat{w_{d}} (k) - w_{d}^{*} (k)

is weight approximation error.

The weights are updated by the following update law of the weights

\hat{w_{d}} (k + 1) = \hat{w_{d}} (k) - γ [h (z) e_{m} (k + 1) + σ \hat{w_{d}} (k)]

(28)

where

γ > 0

and

σ > 0

are positive constant .

Assumption A3.

Within the set

Ω_{ε}

, the ideal neural network weights

w^{*}

and the approximation error are bounded

‖ w_{d}^{*} ‖ ⩽ w_{m}, | | ε_{u} | | ⩽ ε_{l}

(29)

3.3. Near Optimal Feedback Controller Design

In this subsection, we present an adaptive dynamic programming algorithm (ADP) based on the Bellman optimality. The objective is to find the feedback control policy that minimizes the approximated cost function.

First, the initial cost function

V^{0} (e (k)) = 0

which is not necessarily the optimal value function, and then a single control vector

u_{e}^{0} (k) = 0

can be solved by

\begin{matrix} V^{0} (e (k)) = arg min_{u_{e} (k)} {e^{T} (k) Q e (k) + u_{e}^{T} (k) R u_{e} (k) + V^{0} (e (k + 1))} \end{matrix}

(30)

After that, we update the control law,

\begin{matrix} u_{e}^{1} (k) & = arg min_{u_{e} (k)} {e^{T} (k) Q e (k) + u_{e}^{T} (k) R u_{e} (k) + V^{0} (e (k + 1))} \\ = e^{T} (k) Q e (k) + {(u_{e}^{0} (k))}^{T} R u_{e}^{0} (k) \end{matrix}

(31)

hence, for

i = 1, 2, . . . .

,the adaptive dynamic programming algorithm can be realized in a continuous iterative process in

\begin{matrix} V^{i} (e (k)) & = min_{u_{e} (k)} \{e^{T} (k) Q e (k) + u_{e}^{T} (k) R u_{e} (k) + V^{i} (e (k + 1))\} \\ = - \frac{1}{2} R^{- 1} g_{e}^{T} (k) \frac{\partial V^{i} (e (k + 1))}{\partial e (k + 1)} \end{matrix}

(32)

and

\begin{matrix} u_{e}^{i + 1} (k) & = arg min_{u_{e} (k)} {e^{T} (k) Q e (k) + u_{e}^{T} (k) R u_{e} (k) + V^{i} (e (k + 1))} \\ = e^{T} (k) Q e (k) + {(u_{e}^{i} (k))}^{T} R u_{e}^{i} (k) + V^{i} (e (k + 1)) \end{matrix}

(33)

where index i represents the number of iterations of the control law and the cost function, while index k represents time index of system state trajectory. Moreover, it is worth noting in the iterative process of adaptive dynamic programming that the number of iterations of the cost function and the control law increases from zero to infinity.

To begin the development of the feedback control policy, we use neural networks to construct the critic network and the actor network.

The critic network is used for approximating the cost function

V_{i} (e (k)) .

The output of the critic network is denoted as

{\hat{V}}^{i} (e (k)) = w_{c i}^{T} z (ν_{c i}^{T} e (k)) + ε_{c} (k)

(34)

where

z (ν_{c i}^{T} e (k))

is the hidden layer function,

w_{c i}

is the hidden layer weight of the critic network,

ν_{c i}

is the input layer weight of the critic network,

ε_{c} (k)

is the approximation error.

So we define the prediction error of the critic network as

e_{c i} (k) = {\hat{V}}^{i} (e (k)) - V^{i} (e (k))

(35)

The objective function to be minimized for the critic network is

E_{c i} (k) = \frac{1}{2} e_{c i}^{T} (k) e_{c i} (k) .

(36)

The weights of critic network are updated using the gradient descent method through

w_{c i} (k + 1) = w_{c i} (k) - α_{c} [\frac{\partial E_{c i} (k)}{\partial w_{c i} (k)}]

(37)

where

α_{c} > 0

is the learning rate of the critic network, and i is the update count of internal neuron to update the weight parameters.

The inputs of the actor network is the system error

e (k)

, and the outputs of the actor network is the optimal feedback control

u_{e} (k)

. The output can be formulated as

{\hat{u}}_{e}^{i} (k) = w_{a i}^{T} z (v_{a i}^{T} e (k)) + ε_{a} (k),

(38)

where

z (ν_{a i}^{T} e (k))

is the hidden layer function,

w_{a i}

is the hidden layer weight of the actor network,

ν_{a i}

is the input layer weight of the actor network,

ε_{a} (k)

is the approximation error.

Therefore, we define the prediction error of the action network as

e_{a i} (k) = {\hat{u}}_{e}^{i} (k) - u_{e}^{i} (k)

(39)

where

{\hat{u}}_{e}^{i} (k)

is approximating optimal feedback control ,

u_{e}^{i} (k)

is the optimal feedback control at the iterative number i.

The objective function to be minimized for the action network is

E_{a i} (k) = \frac{1}{2} e_{a i}^{T} (k) e_{a i} (k)

(40)

The weights of the actor network are also updated in the same way as the critic network , we use the gradient descent method

w_{a i} (k + 1) = w_{a i} (k) - β_{a} [\frac{\partial E_{a i} (k)}{\partial w_{a i} (k)}],

(41)

where

β_{a} > 0

is the learning rate of the actor network and i is the update count of internal neuron to update the weight parameters.

3.4. Stability Analysis

In this subsection, the stability proof of the system is obtained by Lyapunov stability theory.

Assumption A4.

Radial basis function

h (t) = exp (- \frac{‖ x (t) - c (t) ‖^{2}}{2 b^{2}})

of the maximum value is

h_{m a x} = 1

, where

c (t)

is the center point and b is the width of Radial basis function. Assuming the numbers of neurons is

l \in [l_{f}, l_{d}]

for any radial basis function

h \in [h [x (k)], h [x_{d} (k)]]

, then

\begin{matrix} |h_{i}| ⩽ 1, ∥h∥ ⩽ \sqrt{l} ⩽ l, \\ h^{T} h = {∥h∥}^{2} ⩽ l \end{matrix}

(42)

we can know the maximum value

{∥h∥}^{2}

of the hidden layer with l neurons is

l \in [l_{f}, l_{d}]

, then we assume the maximum value

{∥h [x (k)]∥}^{2}

of the hidden layer for the identifier

\hat{f} (x (k))

l_{f}

, and the maximum value

{∥h_{d} [x (k)]∥}^{2}

of the hidden layer for the steady-state controller

u_{d} (k)

l_{d}

Lemma 2.

The relationship between (25) and weight approximation error (27) satisfies the following equation.

{\tilde{w_{d}}}^{T} (k) h [x_{d} (k)] = \frac{e_{m} (k + 1)}{L_{u}} + ε_{u}

(43)

where

e_{m} (k)

is the error of the approximating state

x_{d} (k)

L_{u} = \frac{\partial L}{\partial u} |_{u = ξ}, ξ \in [u_{d}^{*} (k), u_{d} (k)]

g_{1} ⩾ |\frac{\partial L}{\partial u}| > ϵ > 0

g_{1}

and ϵ are positive constants.

Proof.

Subtracting

w_{d}^{*}

from both sides of (28) , we get

\tilde{w_{d}} (k + 1) = \tilde{w_{d}} (k) - γ [h [x_{d} (k)] e_{m} (k + 1) + σ \hat{w_{d}} (k)]

(44)

Combining(25) and (27) with the mean value theorem, we can obtain

\begin{matrix} L [x_{d} (k), u_{d} (k)] & = L [x_{d} (k)], u_{d}^{*} (k) + {\tilde{w_{d}}}^{T} (k) h [x_{d} (k)] - ε_{u}] \\ = L[x_{d} (k), u_{d}^{*} (k)] + [{\tilde{w_{d}}}^{T} (k) h [x_{d} (k)] - ε_{u}] L_{u} \\ = x_{d} (k + 1) + [{\tilde{w_{d}}}^{T} (k) h [x_{d} (k)] - ε_{u}] L_{u} \end{matrix}

(45)

Further combining (45) with (26), we can obtain

\begin{matrix} e_{m} (k + 1) & = L [x_{d} (k), u_{d} (k)] - x_{d} (k + 1) \\ = [{\tilde{w_{d}}}^{T} (k) h [x_{d} (k)] - ε_{u}] L_{u} \end{matrix}

(46)

After sorting out, we can obtain

{\tilde{w_{d}}}^{T} (k) h [x_{d} (k)] = \frac{e_{m} (k + 1)}{L_{u}} + ε_{u}

(47)

the proof is completed. □

Lemma 3.

For analysis simplicity,

ε_{u}

and

e_{m} (k + 1)

have an inequality relation though using young’s inequality.

- 2 ε_{u} e_{m} (k + 1) ⩽ k_{0} ε_{l}^{2} + \frac{1}{k_{0}} e_{m}^{2} (k + 1)

(48)

where

k_{0}

is a positive constant.

From Figure 1, it can be seen that with

e (k)

x_{d} (k)

and

u_{e}^{i} (k)

, the estimated error

e (k + 1)

can be obtained with the aid of the RBF-NN identifier and the steady-state controller. Using the steady-state controller, we can obtain the reference trajectory

x_{d} (k)

corresponding to the steady-state controller

u_{d} (k)

. Using the ADP algorithm, we can obtain optimal feedback controller

u_{e}^{i} (k)

. Then, the actual controller

u (k) = u_{e}^{i} (k) + u_{d} (k)

and system dynamic

x (k + 1)

can be obtained. Furthermore, with

x_{d} (k)

and

x (k)

we can get the estimated tracking error

e (k)

,further obtained

e (k + 1)

. Finally, we can reconstruct the system dynamic to track the reference trajectory.

Theorem 1.

For the optimal tracking problem (1)-(3), the RBF-NN identifier (16) is used to approximate

f (x (k))

, the steady-state controller

u_{d} (k)

is approximated by the RBF-NN (23), and the feedforward networks (34),(38) is used to approximate the cost function

J (e (k), u (k))

and the feedback controller

u_{e} (k)

, respectively. Assume that the parameters satisfy the following inequality,

\begin{matrix} (a) 0 < η ⩽ \frac{1}{l_{f}} \\ (b) 0 < g_{1} ⩽ k_{0} \\ (c) 0 < (1 + σ) l_{d} γ ⩽ \frac{1}{g_{1}} - \frac{1}{k_{0}} \\ (d) 0 < (l_{d} + σ) γ ⩽ 1 \\ (e) a_{c} \leq 2 / {∥ z (v_{c i}^{T} e (k)) ∥}^{2} \\ (f) β_{a} \leq 2 / {∥ z (ν_{a i}^{T} e (k)) ∥}^{2} \end{matrix}

(49)

where η is the learning rate of the RBF-NN identifier, σ and γ are the update parameters of the steady-state controller approximating network weights,

a_{c}

is the learning rate of the actor network,

β_{a}

is the learning rate of the critic network,

z (v_{c i}^{T} e (k))

and

z (ν_{a i}^{T} e (k))

are hidden layer function of the actor network and the critic network. Then, the closed loop system (6) of approximating error is asymptotically stable when the parameter estimation errors are bounded.

Proof.

Considering the following positive definite Lyapunov function candidate

\begin{matrix} J (k) & = J_{1} (k) + J_{2} (k) + J_{3} (k) + J_{4} (k) \\ = \frac{1}{η} \tilde{w_{f}} {(k)}^{T} \tilde{w_{f}} (k) + \frac{1}{g_{1}} e_{m}^{2} (k) + \frac{1}{γ} \tilde{w_{d}} {(k)}^{T} \tilde{w_{d}} (k) + w_{c i} {(k)}^{T} w_{c i} (k) + w_{a i} {(k)}^{T} w_{a i} (k) \end{matrix}

(50)

where

J_{1} (k) = \frac{1}{η} \tilde{w_{f}} {(k)}^{T} \tilde{w_{f}} (k)

J_{2} (k) = \frac{1}{g_{1}} e_{m}^{2} (k) + \frac{1}{γ} \tilde{w_{d}} {(k)}^{T} \tilde{w_{d}} (k)

J_{3} (k) = w_{c i} {(k)}^{T} w_{c i} (k)

J_{4} (k) = w_{a i} {(k)}^{T} w_{a i} (k)

Firstly, differencing it according to the Lyapunov function of

J_{1} (k) = \frac{1}{η} \tilde{w_{f}} {(k)}^{T} \tilde{w_{f}} (k)

yields

\begin{matrix} Δ J_{1} (k) & = J_{1} (k + 1) - J_{1} (k) \\ = \frac{1}{η} \tilde{w_{f}} {(k + 1)}^{T} \tilde{w_{f}} (k + 1) - \frac{1}{η} \tilde{w_{f}} {(k)}^{T} \tilde{w_{f}} (k) \\ = \frac{1}{η} {[\tilde{w_{f}} (k) + η \tilde{f} (x (k)) h [x (k)]]}^{T} [\tilde{w_{f}} (k) + η \tilde{f} (x (k)) h [x (k)]] - \frac{1}{η} \tilde{w_{f}} {(k)}^{T} \tilde{w_{f}} (k) \\ = \frac{1}{η} [\tilde{w_{f}} {(k)}^{T} \tilde{w_{f}} (k) - \frac{1}{η} \tilde{w_{f}} {(k)}^{T} \tilde{w_{f}} (k) + η^{2} [\tilde{f} {(x (k))}^{T} \tilde{f} (x (k)) h {[x (k)]}^{T} h [x (k)] \\ + 2 η \tilde{f} (x (k)) \tilde{w_{f}} {(k)}^{T} h [x (k)] \\ = η [{[\tilde{w_{f}} {(k)}^{T} h [x (k)] + ▵_{f} [x]]}^{T} [\tilde{w_{f}} {(k)}^{T} h [x (k)] + ▵_{f} [x]] h {[x (k)]}^{T} h [x (k)]] \\ + 2 [\tilde{w_{f}} {(k)}^{T} h [x (k)] + ▵_{f} [x]] \tilde{w_{f}} {(k)}^{T} h [x (k)] \\ = η [\tilde{w_{f}} {(k)}^{T} \tilde{w_{f}} (k) h {[x (k)]}^{T} h [x (k)] + 2 \tilde{w_{f}} {(k)}^{T} h [x (k)] ▵_{f} [x] - 2 \tilde{w_{f}} {(k)}^{T} h [x (k)] \\ - 2 \tilde{w_{f}} {(k)}^{T} \tilde{w_{f}} (k) h {[x (k)]}^{T} h [x (k)] + ▵_{f} {[x]}^{T} ▵_{f} [x] h {[x (k)]}^{T} h [x (k)] \end{matrix}

(51)

According to the Assumption 2, Assumption 4 and (42), (51) can be done to get

\begin{matrix} Δ J_{1} (k) ⩽ & η l_{f}^{2} ‖ \tilde{w_{f}} {(k) ‖}^{2} - 2 l_{f} ‖ \tilde{w_{f}} {(k) ‖}^{2} + η l_{f}^{2} {‖ \tilde{w_{f}} (k) ‖}^{2} \\ + 2 η l_{f} \tilde{w_{f}} {(k)}^{T} h [x (k)] ▵_{f} [x] - 2 \tilde{w_{f}} {(k)}^{T} h [x (k)] ▵_{f} [x] \\ ⩽ ‖ \tilde{w_{f}} {(k) ‖}^{2} (2 η l_{f}^{2} - 2 l_{f}) + (2 l_{f} η - 2) \tilde{w_{f}} {(k)}^{T} h [x (k)] ▵_{f} [x] \\ ⩽ ‖ \tilde{w_{f}} {(k) ‖}^{2} (4 l_{f}^{2} η - 4 l_{f}) \end{matrix}

(52)

After that, differencing it according to the Lyapunov function of

J_{2} (k) = \frac{1}{g_{1}} e_{m}^{2} (k) + \frac{1}{γ} \tilde{w_{d}} {(k)}^{T} \tilde{w_{d}} (k)

yields

\begin{matrix} Δ J_{2} (k) & = J_{2} (k + 1) - J_{2} (k) \\ = \frac{1}{g_{1}} [e_{m}^{2} (k + 1) - e_{m}^{2} (k)] - \frac{1}{γ} \tilde{w_{d}} {(k)}^{T} \tilde{w_{d}} (k) + \frac{1}{γ} \tilde{w_{d}} {(k + 1)}^{T} \tilde{w_{d}} (k + 1) \\ = \frac{1}{γ} {〈 \bar{w_{d}} (k) - γ [h [x_{d} (k)] e_{m} (k + 1) + σ \hat{w_{d}} (k)] 〉}^{T} 〈 \tilde{w_{d}} (k) - γ [h [x_{d} (k)] e_{m} (k + 1) + σ \hat{w_{d}} (k)] 〉 \\ - \frac{1}{γ} \tilde{w_{d}} {(k)}^{T} \tilde{w_{d}} (k) + \frac{1}{g_{1}} [e_{m}^{2} (k + 1) - e_{m}^{2} (k)] \\ = \frac{1}{g_{1}} [e_{m}^{2} (k + 1) - e_{m}^{2} (k)] - 2 \tilde{w_{d}} {(k)}^{T} h [x_{d} (k)] e_{m} (k + 1) + γ σ^{2} \hat{w_{d}} {(k)}^{T} \hat{w_{d}} (k) \\ - 2 σ \tilde{w_{d}} {(k)}^{T} \hat{w_{d}} (k) + γ h^{T} [x_{d} (k)] h [x_{d} (k)] e_{m}^{2} (k + 1) + 2 γ σ {\hat{w_{d}}}^{T} (k) h [x_{d} (k)] e_{m} (k + 1) \end{matrix}

(53)

where

\begin{matrix} 2 σ \tilde{w_{d}} {(k)}^{T} \hat{w_{d}} (k) & = σ \tilde{w_{d}} {(k)}^{⊤} [\bar{w_{d}} (k) + w_{d}^{*}] + σ {[\hat{w_{d}} (k) - ω_{d}^{*}]}^{T} \hat{w_{d}} (k) \\ = σ ‖ \tilde{w_{d}} {(k) ‖}^{2} + {‖ \hat{w_{d}} (k) ‖}^{2} + \tilde{w_{d}} {(k)}^{T} w_{d}^{*} - w_{d}^{*} \hat{w_{d}} {(k)}^{T} \\ = σ [‖ \tilde{w_{d}} (k) ‖^{2} + ‖ \hat{w_{d}} (k) ‖^{2} - ‖ w_{d}^{*} ‖^{2}], \\ γ h^{T} [x_{d} (k)] h [x_{d} (k)] e_{m}^{2} (k + 1) ⩽ γ l_{d} e_{m}^{2} (k + 1), \\ 2 γ σ {\hat{w_{d}}}^{T} (k) h [x_{d} (k)] e_{m} (k + 1) ⩽ γ σ l_{d} [‖ \hat{w_{d}} (k) ‖^{2} + e_{m}^{2} (k + 1)], \\ γ σ^{2} {\hat{w_{d}}}^{T} (k) \hat{w_{d}} (k) = γ σ^{2} {∥\hat{w_{d}} (k)∥}^{2} \end{matrix}

(54)

Substituting (54) into (53) yields

\begin{matrix} Δ J_{2} (k) & ⩽ \frac{1}{g_{1}} [e_{m}^{2} (k + 1) - e_{m}^{2} (k)] - 2 [\frac{e_{m} (k + 1)}{L_{u}} + ε_{u}] e_{m} (k + 1) \\ - σ [‖ \tilde{w_{d}} {(k) ‖}^{2} + γ σ^{2} ‖ \hat{w_{d}} {(k) ‖}^{2} + ‖ \hat{w_{d}} {(k) ‖}^{2} - {‖ w_{d}^{*} ‖}^{2}] + γ l_{d} e_{m}^{2} (k + 1) \\ + γ σ l_{d} [‖ \hat{w_{d}} (k) ‖^{2} + e_{m}^{2} (k + 1)] \\ = [\frac{1}{g_{1}} - \frac{2}{L_{u}} + γ (1 + σ) l_{d}] e_{m}^{2} (k + 1) - \frac{1}{g_{1}} e_{m}^{2} (k) - 2 ε_{u} e_{m} (k + 1) - σ {‖ \tilde{w_{d}} (k) ‖}^{2} \\ + σ ‖ w_{d}^{*} ‖^{2} + σ (- 1 + γ l_{d} + γ σ) {‖ \hat{w_{d}} (k) ‖}^{2} \end{matrix}

(55)

Considering (26) and

g_{1} ⩾ |\frac{\partial L}{\partial u}| > ϵ > 0

, we can deduce

\begin{matrix} \frac{1}{g_{1}} - \frac{2}{L_{u}} ⩽ \frac{1}{g_{1}} - \frac{2}{g_{1}} = - \frac{1}{g_{1}} < 0 \end{matrix}

(56)

With Lemma 2 and Lemma 3, we can further deduce

\begin{matrix} Δ J_{2} (k) & ⩽ [- \frac{1}{g_{1}} + γ (1 + σ) l_{d} + \frac{1}{k_{0}}] e_{m}^{2} (k + 1) + σ (γ l_{d} + γ σ - 1) {‖ \hat{w_{d}} (k) ‖}^{2} \\ - \frac{1}{g_{1}} e_{m}^{2} (k) - σ {‖ \bar{w_{d}} (k) ‖}^{2} + σ ω_{m}^{2} + k_{0} ε_{l}^{2} \\ = - [\frac{1}{g_{1}} - (1 + σ) l_{d} γ - \frac{1}{k_{0}}] e_{m}^{2} (k + 1) + σ [(l_{d} + σ) γ - 1] {‖ \hat{w_{d}} (k) ‖}^{2} \\ - \frac{1}{g_{1}} [e_{m}^{2} (k) - β] - σ {‖ \tilde{w_{d}} (k) ‖}^{2} \end{matrix}

(57)

where

β = g_{1} (σ w_{m}^{2} + k_{0} ε_{l}^{2})

is a positive constant.

Next, we consider the following Lyapunov function

J_{3} (k) + J_{4} (k) = w_{c i} {(k)}^{T} w_{c i} (k) + w_{a i} {(k)}^{T} w_{a i} (k) .

(58)

Then, differencing it according to the Lyapunov function of (58) yields

\begin{matrix} Δ J_{3} (k) + Δ J_{4} (k) & = {w_{c i} {(k + 1)}^{T} w_{c i} (k + 1) + w_{a i} {(k + 1)}^{T} w_{a i} (k + 1)} \\ - {w_{c i} {(k)}^{T} w_{c i} (k) + w_{a i} {(k)}^{T} w_{a i} (k)} \\ = a_{c} {∥ e_{c i} (k) ∥}^{2} (- 2 + a_{c} {∥ z (v_{c i}^{T} e (k)) ∥}^{2}) \\ + β_{a} ‖ e_{a i} {(k) ‖}^{2} (- 2 + β_{a} ‖ z (v_{a i}^{T} e (k)) ‖^{2}) . \end{matrix}

(59)

Finally,

Δ J (k)

is derived from (52), (57) and (59)

\begin{matrix} Δ J (k) & = Δ J_{1} (k) + Δ J_{2} (k) + Δ J_{3} (k) + Δ J_{4} (k) \\ ⩽ 4 ‖ \tilde{w_{f}} {(k) ‖}^{2} (l_{f}^{2} η - l_{f}) - σ {‖ \tilde{w_{d}} (k) ‖}^{2} - [\frac{1}{g_{1}} - (1 + σ) l_{d} γ - \frac{1}{k_{0}}] e_{m}^{2} (k + 1) \\ + σ [(l_{d} + σ) γ - 1] ‖ \hat{w_{d}} (k) ‖^{2} - \frac{1}{g_{1}} [e_{m}^{2} (k) - β] + a_{c} {∥ e_{c i} (k) ∥}^{2} (- 2 + a_{c} {∥ z (v_{c i}^{T} e (k)) ∥}^{2}) \\ + β_{a} ‖ e_{a i} {(k) ‖}^{2} (- 2 + β_{a} ‖ z (v_{a i}^{T} e (k)) ‖^{2}) . \end{matrix}

(60)

Based on the above analysis, when the parameters are selected to fulfill the following condition with

e_{m}^{2} (k) ⩾ β

\begin{matrix} 0 < η ⩽ \frac{1}{l_{f}} \\ 0 < g_{1} ⩽ k_{0} \\ 0 < (1 + σ) l_{d} γ ⩽ \frac{1}{g_{1}} - \frac{1}{k_{0}} \\ 0 < (l_{d} + σ) γ ⩽ 1 \\ a_{c} \leq 2 / {∥ z (v_{c i}^{T} e (k)) ∥}^{2} \\ β_{a} \leq 2 / {∥ z (ν_{a i}^{T} e (k)) ∥}^{2} \end{matrix}

(61)

we can obtain

Δ J (k) ⩽ 0

. This proof is completed. □

4. Simulation

In this section, in order to demonstrate the effectiveness of the proposed tracking control method, a discrete-time nonlinear system is introduced.The case is derived from [47]. We assume that the nonlinear smooth function

f \in R^{n}

is an unknown nonlinear drift function and

g \in R^{n × m}

is a known function. The corresponding

f [x (k)]

and

g (k)

are given as

\begin{matrix} f [x (k)] = [\begin{matrix} - sin (0.5 x_{2} (k)) x_{1}^{2} (k) \\ - cos (1.4 x_{2} (k)) sin (0.9 x_{1} (k)) \end{matrix}] \\ g (k) = [\begin{matrix} {(x_{1} (k))}^{2} + 1.5 & 0.1 \\ 0 & 0.2 ({(x_{1} (k) + x_{2} (k))}^{2} + 1) \end{matrix}] \end{matrix}

(62)

The reference trajectory

x_{d} (k)

for the above system is defined as

x_{d} (k) = [\begin{matrix} 0.25 s i n (10^{- 3} k) \\ 0.25 cos (10^{- 3} k) \end{matrix}]

(63)

where

t i m e (s)

of y-axis is chosen as k(1,...,10000) multiplied by

t s = 0.001

in the simulation.

The RBF networks have a three-layer structure with 2 input neurons, hidden layers have 9 neurons, and output layer have 2 neurons, the parameters

c_{i}

and

b_{i}

of the radial basis functions are chosen as

c_{i}

(i=1,2,...,9) =

[\begin{matrix} - 2 & - 1.5 & - 1.0 & - 0.5 & 0 & 0.5 & 1.0 & 1.5 & 2 \\ - 2 & - 1.5 & - 1.0 & - 0.5 & 0 & 0.5 & 1.0 & 1.5 & 2 \end{matrix}]

and

b_{j} = [2, 2]

, the initial weights

w_{0}

were chosen to be random numbers between (0,1), where the inputs to the RBF-NN identifier are chosen to be

x (k)

and the inputs to the RBF-NN steady-state control

u_{d}

are chosen to be

x_{d} (k)

. Update of weights

\hat{w_{d}}

\hat{w_{f}}

is used in (21) and (28). Because

g_{1} ⩾ \frac{\partial L}{\partial u} = 1

, we can select

g_{1} = 5

. According to

0 < g_{1} ⩽ k_{0}

of Theorem 1, we can select

k_{0} = 10

. For the control parameters

η

, because hidden layers have 9 neurons,

l = 9

0 < η ⩽ \frac{1}{l} ⩽ \frac{1}{9}

, we select

η = 0.1

. While control parameters

γ, σ

, we can know

0 < (1 + σ) 9 γ ⩽ \frac{1}{5} - \frac{1}{10} = \frac{1}{10} = 0.10

and

0 < (9 + σ) γ ⩽ 1

from Theorem 1, so selecting

γ = 0.01, σ = 0.001

. The initial state is set as

x (0) = 0

. We trained the RBF networks with 10,000 steps of acquired data , and Figure 2 and Figure 3 shows the RBF-NN identifier to approximate the tracking curves of the unknown dynamics

\tilde{f}

The performance index is select as

Q = I

and

R = I

that I is the identity matrix with appropriate dimension. For the actor network and critic network, we used the same parameter settings. The initial weights of the critic network and the actor network are chosen as random numbers between

(- 10, 10)

. The input layer have 2 neurons, the hidden layer have 15 neurons, the output layer have 2 neurons, the learning rate is 0.1. The hidden layer uses the function

t a n s i g

and the function

p u r e l i n

, the output layer uses the function

t r a i n l m

.Though parameter settings, we train the actor network and the critic network with 5000 training steps to reach the given accuracy 1e-9. Figure 4 shows the curves of the system control u. In Figure 5 and Figure 6, we can see the curves of the state trajector x and the reference trajector

x_{d}

Based on above the results, the simulation results show that this tracking technique obtains a relatively satisfactory tracking performance for partially unknown discrete-time nonlinear systems.

5. Conclusion

This paper proposed an optimal tracking control scheme through approximate dynamic programming for a class of partially unknown discrete-time nonlinear systems based on RBF-NNs . In dealing with unknown variables, two RBF-NNs are used to approximate the unknown function and the steady-state controller, respectively. Moreover, ADP algorithm are introduced to get the optimal feedback control for tracking the error dynamics, two feedforward neural networks are utilized as structures to approximate the cost function and feedback control inputs severally. Finally, simulation results show a relatively satisfactory tracking performance, which verify the effectiveness of the optimal tracking control technique. In future works, we will consider event-triggered control as well as completely unknown dynamics.

Author Contributions

All the authors contributed equally to the development of the research. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant No. 61463002, the Guizhou Province Natural Science Foundation of China under Grant No. Qiankehe Fundamentals-ZK[2021] General 322 and the Doctoral Foundation of Guangxi University of Science and Technology Grant No. Xiaokebo 22z04.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Acknowledgments

The authors thank to the Journal editors and the reviewers for their helpful suggestions and comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

Broomhead, D.S.; Lowe, D. Radial basis functions, multi-variable functional interpolation and adaptive networks. 1988. [Google Scholar]
Narendra, K.S.; Parthasarathy, K. Identification and control of dynamical systems using neural networks. IEEE Transactions on Neural Networks 1990, 1. [Google Scholar] [CrossRef] [PubMed]
Narendra, K.S.; Mukhopadhyay, S. Adaptive control of nonlinear multivariable systems using neural networks. 1994; 737–752. [Google Scholar]
Hartman, E.J.; Keeler, J.D.; Kowalski, J.M. Layered Neural Networks with Gaussian Hidden Units as Universal Approximations. Neural Computation 1990, 210–215. [Google Scholar] [CrossRef]
Park, J. Universal approximation using radial basis function networks. Neural Comput. 1993. [Google Scholar] [CrossRef]
Lewis, F.L.; Yesildirek, A.; Liu, K. Multilayer neural-net robot controller with guaranteed tracking performance. IEEE Transactions on Neural Networks 1996, 7. [Google Scholar] [CrossRef] [PubMed]
Kobayashi, H.; Ozawa, R. Kobayashi, Hiroaki , and R. Ozawa . Adaptive neural network control of tendon-driven mechanisms with elastic tendons. Automatica 2003, 1509–1519. [Google Scholar] [CrossRef]
Lewis, F.L.; et al. Optimal Control, 3rd ed.; John Wiley & Sons, Inc.: New Jersey, 2012. [Google Scholar]
Mannava, A.; et al. Optimal tracking control of motion systems. IEEE Trans. Control Syst. Technol. 2012, 1548–1558. [Google Scholar] [CrossRef]
Sharma, R.; Tewari, A. Optimal nonlinear tracking of spacecraft attitude maneuvers. IEEE Trans. Control Syst. Technol. 2013, 12, 677–682. [Google Scholar] [CrossRef]
Bellman, R.E. Dynamic Programming. Princeton University Press: Princeton, NJ, 1957. [Google Scholar]
Lewis, F.L.; Syrmos, V.L. Optimal Control; Wiley: New York, 1995. [Google Scholar]
Powell, W.B. Approximate Dynamic Programming: Solving the Curses of Dimensionality; Wiley: New York, NY, USA, 2009. [Google Scholar]
Bertsekas, D.P.; Tsitsiklis, J.N. Neuro-Dynamic Programming; Athena Scientifific: Belmont, MA, USA, 1996. [Google Scholar]
Si, J.; Barto, A.G.; Powell, W.B.; Wunsch, D. Handbook of Learning and Approximate Dynamic Programming; Wiley: New York, NY, USA, 2004. [Google Scholar]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
Lewis, F.L.; Vrabie, D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. 2009, 8, 32–50. [Google Scholar] [CrossRef]
Lewis, F.L.; Vrabie, D.; Vamvoudakis, K.G. Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers. IEEE Control Syst. 2012, 11, 76–105. [Google Scholar] [CrossRef]
Lewis, F.L.; Liu, D. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control; Wiley: Hoboken, NJ, USA, 2013. [Google Scholar]
Fairbank, M.; Li, S.; Fu, X.; Alonso, E.; Wunsch, D. An adaptive recurrent neural-network controller using a stabilization matrix and predictive inputs to solve a tracking problem under disturbances. Neural Netw. 2014, 1, 74–86. [Google Scholar] [CrossRef]
Vrabie, D.; Lewis, F. Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw. 2009, 4, 237–246. [Google Scholar] [CrossRef]
Liu, D.; Yang, X.; Li, H. Adaptive optimal control for a class of continuous-time affifine nonlinear systems with unknown internal dynamics. Neural Comput. Appl. 2013, 11, 1843–1850. [Google Scholar] [CrossRef]
Bhasin, S.; Kamalapurkar, R.; Johnson, M.; Vamvoudakis, K.G.; Lewis, F.L.; Dixon, W. E. A novel actor–critic–identififier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 2013, 1, 82–92. [Google Scholar] [CrossRef]
Al-Tamimi, A.; Lewis, F.L.; Abu-Khalaf, M. Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof. IEEE Trans. Syst. Man Cybern. B Cybern. 2008, 8, 943–949. [Google Scholar] [CrossRef]
Prokhorov, D.V.; Wunsch, D.C. Adaptive critic designs. IEEE Trans. Neural Netw. 1997, 9, 997–1007. [Google Scholar] [CrossRef] [PubMed]
Luo, Y.; Zhang, H. Approximate optimal control for a class of nonlinear discrete-time systems with saturating actuators. Prog. Natural Sci. 2008, 1023–1029. [Google Scholar] [CrossRef]
Dierks, T.; Jagannathan, S. Online optimal control of nonlinear discrete-time systems using approximate dynamic programming. Control Theory Appl. 2011, 361–369. [Google Scholar] [CrossRef]
Si, J.; Wang, Y.-T. Online learning control by association and reinforcement. IEEE Trans. Neural Netw. 2001, 5, 264–276. [Google Scholar] [CrossRef] [PubMed]
Ren, L.; Zhang, G.; Mu, C. Data-based H_∞ control for the constrained-input nonlinear systems and its applications in chaotic circuit systems. IEEE Trans. Circuits Syst. 2020, 8, 2791–2802. [Google Scholar] [CrossRef]
Zhao, F.; Gao, W.; Liu, T.; Jiang, Z.P. Event-triggered robust adaptive dynamic programming with output-feedback for large-scale systems. IEEE Trans. Control Netw. Syst. 2023, 8, 63–74. [Google Scholar] [CrossRef]
Wei, Q.; Li, H.; Yang, X.; He, H. Continuous-time distributed policy iteration for multicontroller nonlinear systems. IEEE Trans. Cybern. 2021, 5, 2372–2383. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Zhao, B.; Liu, D.; Zhang, S. Event-triggered control of discrete-time zero-sum games via deterministic policy gradient adaptive dynamic programming. IEEE Trans. Syst., Man, Cybern., Syst. 2022, 8, 4823–4835. [Google Scholar] [CrossRef]
Zhu, Y.; Zhao, D.; He, H. Invariant adaptive dynamic programming for discrete-time optimal control. IEEE Trans. Syst., Man, Cybern., Syst. 2020, 11, 3959–3971. [Google Scholar] [CrossRef]
Wei, Q.; Han, L.; Zhang, T. piking adaptive dynamic programming based on poisson process for discrete-time nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 2022, 5, 1846–1856. [Google Scholar] [CrossRef]
Yang, Y.; Wunsch, D.; Yin, Y. Hamiltonian-driven adaptive dynamic programming for continuous nonlinear dynamical systems. IEEE Trans. Neural Netw. Learn. Syst. 2017, 8, 1929–1940. [Google Scholar] [CrossRef]
Li, M.; Qin, J.; Freris, N.M.; Ho, D.W. Multiplayer Stackelberg– Nash game for nonlinear system via value iteration-based integral reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 2022, 4, 1429–1440. [Google Scholar] [CrossRef] [PubMed]
Guo, X.; Yan, W.; Cui, R. Integral reinforcement learning-based adaptive NN control for continuous-time nonlinear MIMO systems with unknown control directions. IEEE Trans. Syst., Man, Cybern., Syst. 2020, 11, 4068–4077. [Google Scholar] [CrossRef]
Xue, W.; Fan, J.; Lopez, V.G.; Jiang, Y.; Chai, T.; Lewis, F.L. Off-policy reinforcement learning for tracking in continuous-time systems on two time scales. IEEE Trans. Neural Netw. Learn. Syst. 2021, 10, 4334–4346. [Google Scholar] [CrossRef]
Sun, C.; Li, X.; Sun, Y. A parallel framework of adaptive dynamic programming algorithm with off-policy learning. IEEE Trans. Neural Netw. Learn. Syst. 2021, 8, 3578–3587. [Google Scholar] [CrossRef]
Duan, J.; Guan, Y.; Li, S.E.; Ren, Y.; Sun, Q.; Cheng, B. Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE Trans. Neural Netw. Learn. Syst. 2022, 11, 6584–6598. [Google Scholar] [CrossRef]
Qiao, L.; et al. A novel optimal tracking control scheme for a class of discrete-time nonlinear systems using generalised policy iteration adaptive dynamic programming algorithm. Syst. Sci. 2017, 525–534. [Google Scholar] [CrossRef]
Kiumarsi, B.; Lewis, F.L. Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Trans. Neural Networks Learn. Syst. 2017, 140–151. [Google Scholar] [CrossRef] [PubMed]
Zhang, H.; et al. Optimal tracking control for a class of nonlinear discrete-time systems with time delays based on heuristic dynamic programming. IEEE Trans. Neural Networks 2011, 1851–1862. [Google Scholar] [CrossRef] [PubMed]
Zhang, H.; Wei, Q.; Luo, Y. A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm. IEEE Trans. Syst., Man, Cybern., Syst. 2008, 937–942. [Google Scholar] [CrossRef] [PubMed]
Song, S.; Zhu, M.; Dai, X.; Gong, D. Model-Free Optimal Tracking Control of Nonlinear Input-Affine Discrete-Time Systems via an Iterative Deterministic Q-Learning Algorithm. IEEE Trans. Neural Networks Learn. Syst. 2024, 1, 999–1012. [Google Scholar] [CrossRef]
Huang, Y.; Liu, D. Neural-network-based optimal tracking control scheme for a class of unknown discrete-time nonlinear systems using iterative ADP algorithm. Neurocomputing 2014, 46–56. [Google Scholar] [CrossRef]
Dierks, T.; Jagannathan, S. Optimal tracking control of affine nonlinear discrete-time systems with unknown internal dynamics. In Proceedings of the IEEE Conference on Decision & Control IEEE; 2010. [Google Scholar] [CrossRef]
Zhang, J.; Ge, S.S.; Lee, T.H. Direct RBF neural network control of a class of discrete-time non-affine nonlinear systems. In Proceedings of the American Control Conference; 2002. [Google Scholar]
Ge, S.S.; Zhang, J.; Lee, T.H. Adaptive MNN control for a class of non-affine NARMAX systems with disturbances. Systems & Control Letters 2004, 53, 1–12. [Google Scholar] [CrossRef]

Figure 1. The structure schematic of the proposed technique.

Figure 2. The unkonwn function

f (x_{1})

and approximating the unkonwn function

\tilde{f} (x_{1})

Figure 2. The unkonwn function

f (x_{1})

and approximating the unkonwn function

\tilde{f} (x_{1})

Figure 3. The unkonwn function

f (x_{2})

and approximating the unkonwn function

\tilde{f} (x_{2})

Figure 3. The unkonwn function

f (x_{2})

and approximating the unkonwn function

\tilde{f} (x_{2})

Figure 4. Control input u.

Figure 5. The state trajector

x 1

and the reference trajector

x 1_{d}

Figure 5. The state trajector

x 1

and the reference trajector

x 1_{d}

Figure 6. The state trajector

x 2

and the reference trajector

x 2_{d}

Figure 6. The state trajector

x 2

and the reference trajector

x 2_{d}

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

MDPI Initiatives

Important Links

Choose an area of interest and we will send you notifications of new preprints at your preferred frequency.

Disclaimer

RBF Neural Networks-Based Near-Optimal Tracking Control of Partially Unknown Discrete-Time Nonlinear Systems

Abstract

1. Introduction

2. Problem Statement

3. Optimal Tracking Controller Design with Partially Unknown Dynamic

3.1. RBF-NN Identifier Design

3.2. RBF-NN Steady-State Controller Design

3.3. Near Optimal Feedback Controller Design

3.4. Stability Analysis

4. Simulation

5. Conclusion

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe