We first explore the connection between distributed message queues and their key parameter configurations in the AIoT edge computing environment. Lasso regression[
27] is utilized to select key parameters and their weights, resulting in a performance prediction model. Additionally, we construct a parameter optimization model based on reinforcement learning and employ the Deep Deterministic Policy Gradient (DDPG)[
28] method to optimize the parameters. This enables the current parameter configuration to maximize the system throughput under the current message scale, achieving adaptive optimization of system performance. The specific optimization process is depicted in
Figure 2.
4.1. Parameter Screening
In the AIoT Edge Computing environment, the distributed message system provides asynchronous, peak clipping, and decoupling capabilities, ensures sequential consumption under certain conditions, and offers redundant backup functions. The parameter configuration of the distributed message queue in this study can be modified through the use of the configuration file. The parameters that may impact the throughput performance of the distributed message queue are included in the Broker and Producer configurations. After analyzing the parameter description and technical arguments, twenty-two parameters are selected that are likely to have a significant impact on the throughput of the distributed message queue among the hundreds of parameters available.
To generate samples for the selected parameters that may affect the performance of distributed message queues, the types of values for the twenty-two parameters are classified as discrete or continuous parameters, among which discrete parameters include categorical and discrete numerical parameters. For example, "cType" is a categorical variable with values such as "uncompressed", "producer", and "gzip". In contrast, "bThreads" and "rTMs" are numerical variables, where "bThreads" is a discrete numerical variable and "rTMs" is a continuous numerical variable. Given the environment of the distributed message system, three values are selected for each parameter: K0 (below default), K1 (default), and K2 (above default). The selection of these values aims to reflect the characteristics of the parameter and ensure a comprehensive evaluation of the parameter’s performance under different settings.
Upon selecting the parameter values, the original data undergoes preprocessing. The parameter types of the distributed message queue are classified into numerical and categorical variables. Consequently, it is crucial to preprocess categorical variables by converting them into numerical variables. In this study, one-hot encoding [
29] is utilized to represent categorical variables, resulting in the conversion of both discrete and continuous parameter types into numerical types. One-hot encoding employs an
N-bit state register to encode
N categorical values, with each value having only one corresponding register bit. For discrete or categorical parameters with only two possible values, the default value can be used. The 22 parameter values K0, K1, K2 are then combined to generate a training sample dataset
, which contains 322 samples. As a result, it is necessary to reduce the initial training sample dataset. After the preprocessing stage, a representative final training sample set
is selected for further analysis.
There are two primary approaches to reducing data dimensionality, including projection and manifold learning methodologies. Among these techniques, Principal Component Analysis (PCA) [
14], t-distributed stochastic neighbor embedding (t-SNE), and multidimensional scaling (MDS) are among the most prominent. In this study, PCA is selected as the preferred method for dimensionality reduction due to its projection-based nature, suitability for reducing the dimensionality of the training sample dataset, and faster computational performance compared to alternative techniques. Therefore, PCA is employed to reduce the dimensionality of the initial training sample set
, with the detailed procedure outlined in Algorithm 2. The resulting sample dataset after dimensionality reduction is denoted as
Y, which contains a total of 100 final training samples set
.
Algorithm 2: Dimensionality reduction method based on PCA for the initial training sample set |
|
4.2. Lasso regression-based performance modeling
The current set of parameters for the distributed message system was based on the documentation and expert advice from other distributed message queues. However, given the particularity of the AIoT edge computing environment in this study, some parameters may have a negligible impact on the performance of the distributed message queue compared to others. Therefore, a regression algorithm can be employed to further screen the parameters based on the final training sample set that has been selected.
Regression algorithms are supervised methods that utilize labeled samples to create a mathematical model. To develop a final performance prediction model relevant to the distributed message system and throughput, regression algorithms can be used to calculate parameter weights and conduct final screening. Our study elects a comprehensive set of 100 training samples to provide a large dataset and minimize noise interference. To avoid overfitting, regularization is also implemented. Our method utilizes Lasso regression [
15] to eliminate parameters with minimal impact on performance during parameter screening, as illustrated in Algorithm 3.
Algorithm 3: Key parameter screening method based on Lasso regression |
|
The significant configuration parameters in large-scale distributed message systems highlighted in this section are determined through the aforementioned steps, along with their corresponding weight magnitudes indicating their impact on performance, as shown in
Table 1.
To attain performance optimization of the distributed message system in the AIoT edge network, a preliminary model for performance optimization is desired, based on the final selection of 14 key parameters, as demonstrated in the equations below.
where
are the key parameters, and
is a constant with different sizes in different distributed message system scenarios.
4.3. Parameter optimization method based on deep deterministic policy gradient algorithm
DDPG is a powerful deep reinforcement learning algorithm [
28] that effectively handles continuous action spaces in high-dimensional environments by combining Q-learning and Actor-Critic approaches. The algorithm employs an Actor-network to generate actions and a Critic network to estimate the Q-value function, and updates the policy by computing the gradient of the Q-value function with respect to the policy parameters and using it to update the Actor-network. DDPG also utilizes experience replay and target networks to ensure stability during training and prevent overfitting. In contrast, DQN is a popular reinforcement learning method that uses deep learning to handle high-dimensional state spaces by approximating the Q-value function using a neural network and treating rewards as labels for training the network. However, DQN is designed for handling discrete action spaces and may not be suitable for continuous action spaces. In continuous action spaces, approximating the Q-value function directly using a neural network can be difficult, limiting the algorithm’s ability to handle high-dimensional continuous actions. Thus, in this study, we use the DDPG algorithm illustrated in
Figure 3 to effectively address the challenges posed by continuous action spaces in high-dimensional environments.
The DDPG algorithm utilizes two neural networks [
28], the Actor network and the Critic network, to effectively handle continuous action spaces. To prevent complications during the update process caused by constant changes to the target, the DDPG algorithm employs separate current and target networks. As a result, there are four neural networks: the current Critic network, the current Actor network, the Critic target network, and the Actor target network. The current Actor network takes the current state, reward, and next state as input and outputs the action to be executed in the next step. On the other hand, the current Critic network evaluates the current Q-value. It takes the action and state output by the Actor network as input and outputs the Q-value.
To update the two target networks based on the two current networks, the DDPG algorithm periodically updates the network parameters of the Actor target network and the Critic target network in a soft manner based on the parameters of the current Actor network and current Critic network, respectively. The Actor target network selects the optimal next action
based on the next state
sampled from the experience replay pool. Meanwhile, the Critic target network calculates the target Q-value. The detailed algorithmic description is shown in Algorithm 4.
Algorithm 4: DDPG based distributed message system configuration optimization (DMSCO) |
|
We commence by initializing four deep neural networks: the evaluation critic network, evaluation actor network, target critic network, and target actor network. These networks utilize learnable parameters w, , , and to approximate the Q-value function and policy function. At each time step, the current state and action are input to the current critic and actor networks, yielding the Q-value and action. Subsequently, several hyperparameters are specified, including the soft update coefficient , discount factor , experience replay buffer , batch size m for batch gradient descent, target Q-network update frequency C, and maximum number of iterations T. Additionally, a random noise function is initialized to enhance learning coverage and introduce stochasticity during training. Finally, the first state in the state sequence is designated as the initial state, from which the learning algorithm proceeds. We can train the DDPG algorithm through these initialization steps to obtain an optimized policy for the given task.
We optimize the distributed message system by initially mapping the range of the final key parameter set (which has been converted to a unified numerical key parameter) to the state space. The current state S is then defined as the throughput of the system under the current key parameter configuration. The action A is specified to increase, keep constant, or decrease each key parameter. The reward R is computed based on the ratio of the current state (i.e., the current throughput) to the previous state throughput after performing the action A. If the ratio is greater than 1, the reward is set to 1. If the ratio is less than 1, the reward is set to . Otherwise, the reward is set to 0.
Moreover, the key parameters of the distributed message system are initially set to default values to form the initial state sequence
S. The feature vector of this state sequence is then computed to represent the current state’s parameter configuration. The current state
S is subsequently fed into the actor network, which utilizes policy gradient calculations as described in Equation
2 to determine the appropriate action that the network outputs.
To update the parameter values of the distributed message system, action
A is executed, resulting in a new state. The reward
R is then computed based on the new state’s throughput compared to the previous state
S. The new state is observed, and its feature vector is obtained. The quadruple
is then stored in the experience replay pool
. If the reward
R equals 1, the current state is updated to
; otherwise, the current state remains unchanged. Through this process, the distributed message system can optimize performance by learning from past experiences.
Our proposed DMSCO algorithm randomly samples 32 experience data from the buffer
, denoted as
, the target Q value can be calculated as in Eq. (
3).
Then, the algorithm minimizes the MSE loss in Eq. (
4) to update the critic evaluation network.
Finally, DMSCO calculates the loss function in Eq. (
5). And update the actor network with gradient ascent in Eq. (
6)
The iteration continues until the maximum number of iterations is reached, at which point the training process terminates. The final output is the configuration of parameters in the optimal action A, which represents the optimal configuration of the distributed message system for the present message transmission scenario.