1. Introduction
As a kind of unmanned flight platform, quadrotor UAV has the advantages of simple structure, lightweight fuselage and low cost, and is widely used in numerous tasks, such as aerial photography, agricultural plant protection, rescue and relief, remote sensing mapping and reconnaissance [
1,
2,
3,
4]. A wide range of application scenarios also put forward strict requirements for its flight control capability, especially the attitude control during the flight of UAV [
4,
5,
6]. However, the lightweight fuselage of quadrotor leads to its poor ability to resist external disturbances, which reduces the accuracy of attitude control.
There have been many researches on attitude control methods of quadrotor. Some linear control methods such as proportional integral derivative (PID) control [
7,
8,
9] and linear quadratic regulator [
10] have been widely used in engineering practice, due to their advantages of simple structure and easy implementation. The PID and LQ methods were applied in the attitude angles control for a micro quadrotor, and the control laws were validated by autonomous flight experiments in the presence of external disturbances [
11]. A robust PID control methodology was proposed for a quadrotor UAV regulation, which could reduce the power consumption and perform well in the disturbances of parameter uncertainties and aerodynamic interferences [
12]. 12 PID coefficients of a quadrotor controller were optimized by 4 classical evolutionary algorithms, respectively, and the simulation results indicated that the coefficients optimized from the differential evolution algorithm (DE) could minimize the energy consumption, compared with other algorithms [
7]. While linear or coefficient-optimized linear controllers may be suitable for some of the above scenarios, it is often found that the nonlinear effects of the quadrotor dynamics are non-negligible [
13], and that the linear control methodologies are incapable due to the reliance on approximately linearized dynamical models. Various control approaches have been used in the quadrotors considering the nonlinear dynamics model. One of these approaches is nonlinear dynamic inversion (NDI), which can theoretically eliminate the nonlinearities of the control system [
14], but this control method is on much dependence of the model accuracies [
15]. The incremental nonlinear dynamic inversion (INDI) methodology was used to improve the robustness against the model inaccuracies, which could achieve the stable attitude control even the change in pitch angle was up to [
16]. Adaptive control algorithm has also been widely used in quadrotor systems [
17,
18]. Two adaptive control laws were designed for the attitude stabilization of a quadrotor, to deal with the problem of parametric uncertainty and external disturbance [
18]. A robust adaptive control strategy was developed for the attitude tracking of foldable quadrotors, which were modeled as switched systems [
19].
Due to the advantages of fast response and strong robustness, the sliding mode control (SMC) methodology has been widely applied in the attitude tracking of quadrotors [
20,
21]. However, the problem of control input chattering is apparent, as the traditional reaching law designed in SMC. A fuzzy logic system was designed to schedule the control gains of the sign function adaptively, which could effectively suppress the control signal chattering [
22].A novel discrete-time sliding mode control (DSMC) reaching law was proposed based on theoretical analysis, which could reduce the chattering significantly [
23]. An adaptive fast nonsingular terminal sliding mode (AFNTSM) controller was introduced to achieve the attitude regulation and suppress chattering phenomenon, and the effectiveness of this controller was verified by experiments [
24]. A fractional-order sliding mode surface was designed to adjust the parameters of SMC adaptively, in the fault tolerant control for a quadrotor model with mismatched disturbances [
25].
The above researches have great reference significance. However, the control signal chattering still needs further improvement and attention, when the SMC method applied in attitude regulation with external disturbances. With the development of artificial intelligence technology, more and more reinforcement learning algorithms have been applied to traditional control methodologies [
26,
27].Inspired by these studies, a deep deterministic policy gradient (DDPG) [
28] agent was introduced to the SMC in this paper. The parameters associated with the sign function can be regulated adaptively by the trained DDPG agent, and the phenomenon of control input chattering can be suppressed in the attitude control, in the presence of external disturbances.
The main contributions of our work are outlined as follows:
(1) A sliding mode controller is designed for attitude control for quadrotor UAV, in the presence of external disturbances.
(2) A reinforcement learning agent based on DDPG is trained to adjust the switching control gain adaptively in the traditional SMC method.
(3) The proposed DDPG-SMC approach can suppress the chattering phenomenon in the attitude control using traditional SMC method.
The remainder of this paper is organized as follows.
Section 2 introduces the attitude dynamics modeling for quadrotor UAV. In section 3, the traditional SMC and proposed DDPG-SMC are designed for solving the attitude control problems. In
Section 4, the robustness and effectiveness of the proposed control approach are validated through simulation results, followed by key conclusions in
Section 5.
2. Attitude Dynamics Modeling for Quadrotor UAV
The quadrotor is considered as a rigid body, which attitude motion can be described by two coordinate frames, an inertial reference frame (frame I)
and a body reference frame (frame B)
, as shown in
Figure 1. The attitude motion of the quadrotor can be achieved by the rotation of each propeller. The attitude angles can be described as
in the frame B, where
are the roll angle (rotation around the x-axis), pitch angle (rotation around the y-axis) and yaw angle (rotation around the z-axis), respectively. The attitude angular velocities are expressed as
, where
are the angular velocities in the roll, pitch and yaw directions, respectively.
According to the relationship between the angular velocities and the attitude rate, the attitude kinematics equation of the quadrotor can be expressed as follows [
29]:
where
The attitude dynamics equation of the quadrotor can be written as follows [
30]:
where
,
,
and
are the moments of inertia along
,
and
axes, respectively;
denotes the control inputs,
,
and
are the control torques in the roll, pitch and yaw directions, respectively. The control inputs
developed by the four propellers can be defined as follows [
31,
32]:
where the parameter
denotes the lift coefficient, the parameter
represents the drag coefficient, the parameter
is the length between the quadrotor’s center of mass and the rotation axis of any propeller, and
represents the thrust force provided by the
ith propeller.
When the external disturbances are taken into account, the attitude dynamics equation (3) can be rewritten as
where
denotes the external disturbances.
3. Control Design for Attitude Control
In view of the attitude control in the presence of external disturbances, a sliding mode controller including its sliding mode surface and reaching law is selected for the quadrotor dynamic system, and the stability of the designed SMC system is supported by Lyapunov stability theorem. Then a reinforcement learning agent based on DDPG is trained and applied to the above SMC method, without affecting the stability of the system.
3.1. SMC Design
In this section, a sliding mode controller is designed for the attitude regulation of the quadrotor. The control objective can be described as: the actual attitude need to be regulated to the desired attitude asymptotically, i.e., .
In the controller designed process, the sliding mode surface is first selected, then the control law is chosen to compute the control signal, in the end, the stability proof of the designed SMC system is supported by Lyapunov stability theorem. The control scheme of SMC for attitude tracking is depicted in
Figure 2.
The specific design of the sliding mode controller can be expressed as follows.
SMC Algorithm |
Input: (1) Desired attitude angles (2) Actual attitude angles (3) Model parameters of the quadrotor Output: The control signals for attitude dynamics model Step 1: Design of the control signal (a) Define the sliding mode surface s (b) Select the reaching law (c) Compute the control signal Step 2: Proof of the stability of closed-loop system (a) Select a Lyapunov candidate function (b) Calculate the first-order derivative of (c) Analyze the sign of the above derivative of (d) Conclude the convergence of the attitude motion Step 3: Termination If the attitude control errors meet the requirements, conduct the algorithm termination and output the control signal . Otherwise, go to step 1 until the convergence of control errors. |
Step 1 (a):
The control error can be defined as
Then sliding mode surface can be derived as
where
, and
are selected positive numbers.
The derivative of
s can be expressed as
Substitute Equation (1)-(3), (6) into (8), then we can obtain
where we can define
Equation (9) can be rewritten as
Assumption 1: The external disturbance
is assumed to be bounded and satisfies
in which
is a positive finite variable .
Step 1 (b):
The reaching law of the sliding mode surface can be selected as follows [
33]:
in which
and
are both diagonally positive definite matrices, with
, and
,
,
are selected as positive numbers, as same as
, and
(
) is also a selected positive number, and
represents the sign function.
Step 1 (c):
Based on the calculations of angular velocity
and transformation matrix
, as well as the derivation of Equation (9) and (13), the control signal for attitude dynamics model can be designed as follows
Step 2:
The stability of the closed-loop system is proven as follows.
Theorem 1.
Considering the attitude dynamics system described as Equation (5), with the sliding mode surface selected as Equation (7), if the exponential reaching law is chosen as Equation (13), and the control signals for attitude dynamics model is designed as Equation (14), then the designed SMC system is stable and the actual attitude can converge to the desired attitude in finite time.
Proof of Theorem 1. We can select a Lyapunov candidate function as
Based on Equation (11), take the derivative of Equation (15) with respect to time, we can obtain that
Then substitute (14) into (16), we have
We can assume that the sliding mode surface
and obtain the following equation:
Based on the selection of diagonal positive definite matrix
, we can obtain the following expression:
Remark 1. From (19), the designed control law (14) can guarantee the stability of the closed-loop system based on Lyapunov stability theorem. The attitude tracking error will converge to zero asymptotically, if the sliding mode surface is equal to zero. Consequently, the stability prove of the designed SMC system has been completed.
3.2. DDPG-SMC Design
The above derivations have proven that the control error can converge to zero asymptotically in the designed SMC for nonlinear system (5). However, high-frequency chattering of the control signal will appear near the sliding surface, due to the selected reaching law (13) with a sign function, and chattering intensity is determined by the parameter related to the sign function, i.e. control gain .
Inspired by the combination of reinforcement learning algorithms and traditional control methodologies, a reinforcement learning agent based on DDPG is trained to adjust the switching control gain adaptively. During the training process, the input signals of agent are the actual and desired attitude angles, and the output action is the time-varying control gain. Then the trained agent is applied as a parameter regulator for the designed SMC, and the block diagram of designed DDPG-SMC is shown in
Figure 3.
The architecture of the DDPG-based parameter regulator is shown in
Figure 4.
DDPG is an algorithm to solve continuous action problem in the framework of Actor-Critic (AC) [
34], where the policy network parameters are continuously optimized, so that its output action a can get higher and higher scores in the value network. In the designed DDPG-SMC approach this paper, the DDPG agent need to be trained beforehand. The described system in
Figure 3 is served as training environment, and the training data are derived from multiple flight simulations.
The basic principle of DDPG algorithm can be introduced as follows.
DDPG Algorithm |
Input: Experience replay buffer , Initial critic networks’ Q-function parameters , actor networks’ policy parameters , target networks and . Initialize the target network parameters: . for episode =1 to M do Initialize stochastic process to add exploration to the action. Observe initial state . for time step =1 to T do Select action . Perform action and transfer to next state , then acquire the reward value and the termination signal . Store the state transition data in experience replay buffer . Calculate the target function: Update the critic network by the minimized loss function: Update the actor network by policy gradient method: Update target networks: end for end for
|
The design of DDPG-based parameter regulator consists of two processes: training and validation. In the training process, the flight simulation of the quadrotor is carried out, and all the state and control data of the quadrotor are collected, that is, the accumulation of experience. Then, according to the accumulated experience data, the neural network parameters are optimized and updated with gradient calculation, stochastic gradient descent method and other methods. After multiple episodes of iterative training, the policy in the policy function converged to the optimal one. The validation process is used to validate the feasibility and generalization of the trained agent's optimal policy.
To train the DDPG agent to adjust the switching control gain , the training episodes were set as 200, with the simulation time of each episode being 10 s and the time step being 0.02 s. Initial and desired attitude angles during the training were selected as and , respectively. The input signals of the agent were chosen as the actual and desired attitude angles (,), and the output action was the time-varying control gain .
The cumulative reward after each episode of training was recorded and output, and the reward at each step could be calculated using the following equation:
where
were selected as
.
The training process would be stopped when the average reward of cumulative training was less than -1 or the number of training episodes reached 200. The final training result is shown in
Figure 5, it can be seen that the reward value converges to the maximum at the 170th training episodes. It indicates that the agent has completed training, and can be introduced as a parameter regulator in the above sliding mode controller.
In order to verify the generalization of the trained agent's optimal policy, it is necessary to test the control performance for the UAV model under different flight conditions. Specifically, it is necessary to evaluate the improvement of the control performance, through adjusting control parameters adaptively under different flight conditions. The relevant numerical simulation results are shown in
Section 4.3.
Remark 2. By using the designed parameter regulator based on trained DDPG agent, the switching control gain related to reaching law, can be adjusted adaptively according to the attitude control error.
Therefore, the control signal in DDPG-SMC can be represented as
in which
is the time-varying switching control gain related to reaching law.
Author Contributions
Conceptualization, W.H. and Y.Y.; methodology, W.H.; software, W.H.; validation, W.H. and Y.Y.; formal analysis, W.H. and Z.L.; investigation, W.H.; resources, W.H.; data curation, W.H.; writing-original draft preparation, W.H.; writing-review and editing, Y.Y.; visualization, W.H. and Z.L.; supervision, Y.Y.; project administration, Y.Y. All authors have read and agreed to the published version of the manuscript.