2.1. Attack Traffic Detection Technology Based on Machine Learning
Network traffic attack detection technology based on machine learning is one of the hot research spots at present. Its main idea is to analyze the network situation by combining the characteristics of network traffic, and determine whether there is an attack behavior traffic that is different from the normal traffic through machine learning algorithm. Among them, machine learning algorithms mainly include naive Bayes, decision trees, support vector machines, etc., which carry out anomaly detection through classification and clustering of network traffic data. In this kind of recognition algorithms, the most studied methods are statistics-based and behavior-based[
2]. Both directions follow the idea of traditional machine learning methods: design a set of traffic feature sets, build a machine learning model according to actual needs, input the feature data of known labels into the model to complete parameter training, and test the recognition performance of the model through the traffic data of unknown labels. Compared with the rule-based method, this method avoids the research process of port and traffic keywords, and can adapt to encrypted traffic and many complex traffic modes, so the computational complexity is relatively low, so it has attracted more and more attention in the academic circles in recent years.
Chan used clustering methods to compare with existing rule methods and found that machine learning methods have high performance in identifying unlabeled attack traffic data.
Omar explored the performance of different machine learning methods, supervised and unsupervised, in terms of anomaly detection accuracy, and finally found that the unsupervised method performed better against unknown attacks, but the supervised method had higher detection accuracy against known attacks.
Casas uses a simple detection model based on decision tower under large-scale network traffic to study the performance of machine learning methods on abnormal traffic detection, and finds that the model still has good detection performance in the face of a large number of traffic data.
However, some researchers have found that the model can maintain a high recognition performance mainly depends on a large number of feature engineering, whose quality will directly affect the classification performance of the network, which often requires a lot of work[
3]. Because machine learning algorithms require explicit input features, feature extraction is crucial for detection effectiveness. However, the amount of information in network traffic is large and attackers will take various covert means to hide the attack traffic, so feature extraction is a difficult task, and different feature extraction methods will lead to different performance. Since the training data of machine learning algorithm is historical data, it lacks generalization ability for emerging attack behavior. Attackers will constantly try new attack methods to evade detection systems, so algorithms need to be constantly updated and optimized to adapt to new attack methods.
2.2. Attack Traffic Detection Technology Based on Deep Learning
As an important branch of machine learning algorithms, deep learning-based network traffic attack detection technology has also attracted wide attention. Compared with traditional machine learning methods, deep learning-based methods can automatically extract advanced features in network traffic to detect network attack behaviors more accurately, addressing the limitations of traditional machine learning methods in feature engineering.Common deep learning-based network attack detection algorithms include Deep Neural Networks (DNN)[
4], Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory networks [
5](LSTM), and Autoencoders. These algorithms can extract deep and high-level features from raw network traffic data, have strong adaptability and generalization ability, and can effectively detect various types of network attacks.
1.Autoencoder-based Network Attack Traffic Detection Method
This method involves building an Autoencoder to train on normal traffic data and using it to detect abnormal traffic. The main idea is that normal traffic exhibits certain regularities and characteristics, which can be learned and extracted by the autoencoder. Abnormal traffic, having different characteristics from normal traffic, can be detected by comparing the reconstruction error or anomaly score. Xu [
8] successfully used an autoencoder to represent features and achieve anomaly detection within financial enterprises.
2.Convolutional Neural Network-based Network Attack Traffic Detection Method
This method constructs a convolutional neural network to train on traffic data and detect abnormal traffic. The key idea is to use the CNN to extract and learn features from traffic data, enabling more accurate classification and detection of traffic. Lotfollahi [
9] extracts the spatiotemporal features of data packets through multi-layer convolution and pooling operations, and then performs packet detection and identification through the fully connected layer and softmax layer.
3.Recurrent Neural Network-based Network Attack Traffic Detection Methods
Recurrent neural network models, such as LSTM or gated recurrent units [
6](GRU), are used to model and analyze network traffic data. This approach treats network traffic data as a sequence of data and uses the recurrent neural network model to achieve modeling and analysis, thus enabling the detection and identification of network attack traffic.
Although deep learning models excel at handling highly complex operations and data, and can efficiently learn temporal and spatial features of traffic, their interpretability is poor. Due to the lack of clear feature definitions, the credibility of model training results can be questioned.
In summary, existing attack traffic detection technology has been widely used in complex network environments and has successfully addressed many security problems. However, we cannot ignore the limitations related to network traffic and the models themselves: machine learning models have high data requirements, so data imbalance issues can affect the accuracy of these models in detecting attack traffic. Therefore, it is necessary to consider these problems and propose new attack traffic detection methods.
2.3. Variational Autoencoder (VAE)[7]
Variations, or variations. What should we know about functionals before we talk about variations? To review the functions we have learned since the beginning, it is to take a given input value x, through a series of changes f(x), to get the output value y. Notice here we’re putting a number in, and we’re putting a number out. Is there a case where our argument is a function instead of a number? The classic question is, given two fixed points A and B, we can take any path from point A to point B, and find out in what path the time from point A to point B is shortest? By this point most people have the answer - the shortest line between two points. A function where the input variable is a function and the output variable is a numerical value is called a functional. The popular understanding of a functional is a function of a function.
Usually, we feed the input image into the NN Encoder and get a latent code, usually the dimension of this latent code is much smaller than the dimension of the input object, which is a compact representation of the input object. Next, we feed this latent code into [
8]NN Decoder for decoding and output the reconstructed original object.
Figure 1.
Auto-Encoder architecture diagram.
Figure 1.
Auto-Encoder architecture diagram.
Auto-Encoder was proposed by Rumelhart in 1986 and can be used for processing high-dimensional complex data, which promotes the development of neural networks. A self-coding neural network is an unsupervised learning algorithm (training examples are not labeled) that uses the BP backpropagation algorithm and strives to make the output as close to the input as possible.
AE networks generally have two characteristics:
1.dim(Hidden layer) << dim(Input layer) : the hidden layer dimension should be much smaller than the input dimension.
2. The Output of the decoding layer is used for Reconstruction Input, so we should minimizer(Reconstruction error(Input, Output)), that is, minimize the reconstruction error between input and output.
AE algorithm description:
1.Encoder is responsible for compressing the input data, and compressing the N-dimensional input data into M-dimensional data (m << n) through the Hidden layer. In other words, encoder learns a set of parameters to obtain a latent code;
2.Decoder is responsible for restoring data and restoring original data as much as possible with the least loss when needed.
AE can be applied to data dimensionality reduction, feature extraction and data visualization analysis in machine learning, and can also be extended and applied to generative models.
Auto-Encoder for a particular generation model, it should generally satisfy the following two points:
1. Encoders and decoders can be separated independently[
9] (similar to GAN Generator and Discriminator).
2. Any code sampled in a fixed dimension should be able to produce a clear and true picture through the decoder.
VAE is to add appropriate noise to the code on the original AE structure. First we input the input into the NN Encoder and calculate two sets of encodings: one is encoded as the mean encoding
, and the other is encoded as the variance encoding
that controls the noise interference degree. The variance code
is mainly used to assign weight to noise code. In the figure, a layer of exponential operation is applied to variance code
before the weight is assigned to
, as long as the reason is that the weight value learned by NN is positive and negative, so this is to ensure that the weight assigned is positive. Finally, we overlay the original code
with the weighted noise code to obtain a new latent code, which is then fed into the NN Decoder. Observing the figure above, it can be seen that in addition to the reconstruction error of traditional AE, the loss function has the following additional item:
The principle of VAE is an unsupervised generative model, which is based on Gaussian mixture model.
Figure 2.
Flow chart of VAE algorithm.
Figure 2.
Flow chart of VAE algorithm.
From the model structure of VAE[
10], we can see that the noise code z is a vector generated by a standard normal distribution, which we randomly sample m points, where m follows a polynomial distribution p(x). Every time we sample a point m, we map it to a Gaussian distribution
), so a polynomial distribution using the mixture model can be represented as:
The m~p(m),x | m~N(). After the above operation, we can convert the original discrete encoding with a large number of distorted areas into a continuous and effective encoding.
Therefore, variational autoencoders (VAE) -based anomaly detection methods identify anomalies by learning potential representations of the data. VAE is a generative model that compresses input data into a potential space via an encoder and then reconstructs the data via a decoder. The normal data is reconstructed efficiently by VAE, while the abnormal data has a large reconstruction error due to the different distribution from the training data. For example, in network traffic monitoring, VAE can detect abnormal traffic by analyzing normal traffic patterns, and when a significant reconstruction error is detected, it can be flagged as a potential network attack or failure.
The advantage of VAE lies in its high productivity and flexibility[
11,
12]. It is not only able to handle high-dimensional data, but also to capture complex patterns in the data, which makes it excellent in anomaly detection. VAE is more adaptable to new and unseen anomalies than traditional statistical or rule-based methods. For example, in the predictive maintenance of industrial equipment, VAE can be used to model the normal operation data of the equipment, and when the equipment exhibits abnormal behavior, VAE can quickly identify and issue alerts, thereby avoiding potential failures and downtime.