4.2. Setup of Proposed Model
Generative Adversarial Network (GAN) is a machine learning based deep learning methods used to generate new data. It is an unsupervised learning task that involve learning from input data to produce new samples from the original dataset. GAN is used in the literature in many applications such as computer vision [
25], Time-series applications [
26], health [
27] and more making a significant advancement and outperformance in the data generation. As many improvements and versions for the GAN are proposed in order to fit it with the application domain and increase the performance and model accuracy [
28,
29], this paper proposes a new version of GAN called Triple Discriminator Conditional Generative Adversarial Networks (TDCGAN) as an augmentation tool to generate new data for the UGR’16 dataset with the aim to restore balance in the dataset by increasing minor attack classes.
In the TDCGAN, the architecture consists of one generator and three discriminators. Generator takes random noise from a latent space as input and generates raw data that closely resembles the real data, aiming to avoid detection by discriminators. Each discriminator is a deep neural network with different architecture and different parameters setting. Each discriminator’s role is to extract features from the output of the generator and classify the data with varying levels of accuracy for each discriminator. An Election layer is added to the end of TDCGAN architecture that gets the output from the three discriminators and perform an election procedure to get the best result with highest classification accuracy in a form of ensemble method. The model aims to classify data into two groups: normal flows for the background flow with 0 representation and anomaly flows for the attack data with 1 representation. Also, in the case of anomaly flow, the model classifies it to its specific class type.
Figure 2 shows the workflow of the proposed TDCGAN model. The setting details of generator and each discriminator are given below.
The model of the generator is a Deep Multi-layer Perceptron (MLP) composed of Input layer, output layer and four hidden layers. Initially, the generator takes a point from latent space to generate new data. The latent space is a multi-dimensional hypersphere normal distributed points where each variable drawn from the distribution of the data in the dataset. An embedded layer in the generator creates a vector representation for the generated point. Through training, the generator learns to map point from the latent space into a specific output data which will different each time the model is trained. Taken a step further, new data are then generated using random points in the latent space. So, the points in the latent space are used to generate a specific data. The discriminator distinguished the new data generated by the generator from the true data distribution.
GAN is unsupervised learning model. Both the generator and discriminator models are trained simultaneously [
30]. The generator produces a batch of samples, which, along with real examples from the domain, are fed to the discriminator. The discriminator then classifies them as either real or fake. Subsequently, the discriminator undergoes updates to improve its ability to distinguish between real and fake samples in the subsequent round. Additionally, the generator receives updates based on its success or failure in deceiving the discriminator with its generated samples.
In this manner, the two models engage in a competitive relationship, exhibiting adversarial behaviour in the context of game theory. In this scenario, the concept of zero-sum implies that when the discriminator effectively distinguishes between real and fake samples, it receives a reward or no adjustments are made to its model parameters. Conversely, the generator is penalized with significant updates to its model parameters.
Alternatively, when the generator successfully deceives the discriminator, it receives a reward or no modifications are made to its model parameters. However, the discriminator is penalized and its model parameters are updated. This is the generis GAN approach.
In the proposed TDCGAN model, the generator takes as input the point from the latent space and produce a data to a data distribution of the real data in the dataset. This is done through a fully connected layers with 4-hidden layers, one input layer and one output layer. The discriminators try to classify data into its corresponding class which is done through a fully connected MLP network.
MLP has gained widespread popularity as a preferred choice among neural networks [
31,
32]. This is primarily attributed to its fast computational speed, straightforward implementation, and ability to achieve satisfactory performance with relatively smaller training datasets.
In this paper, the generator model will learn how to generate new data similar to the minor class in the URG’16 dataset, while discriminators will try to distinguish between real data from the dataset and new one generated by generator. During the training process, both the generator and discriminator models are conditioned on the class label. This conditioning enables the generator model, when utilized independently, to generate minor class data within the domain that correspond to a specific class label. TGCGAN model can be formulated by integrating both the generator and three discriminators’ models into a single, larger model.
The discriminator models undergo separate training, where each model weights are designated as non-trainable within the TDCGAN model. This ensures that solely the weights of the generator model are updated during the training process. This trainability modification specifically applies when training the TDCGAN model, not when training the discriminator independently. So, TDCGAN model is employed to train the generator’s model weights by utilizing the output and error computed by the discriminator models.
A point in the latent space is provided as input to TDCGAN model. The generator model generates the data based on this input, which is subsequently fed into the discriminator model. The discriminator then outputs a classification, determining whether the data is real or fake and in case of fake data, the model classify it to its corresponding class.
The generator takes a batch of vectors (z) which are randomly drown from Gaussian distribution, and map them to G(z) which have the same dimension of the dataset. The discriminators take the output from the generator and tris to classify it. The loss is then evaluated between the observed data and the predicted data and is used to update the weights of the generator only to ensure that only generator weights are updated. The difference between observed data and the predicted data is estimated using cross-entropy loss function which is expressed in the following equation.
where
is the true label (1 for malicious traffic and 0 for normal traffic) and p(
) is the predicted probability of the observation (i) calculated by the sigmoid activation function. N is the number of observations in the batch.
The generator model has 4-hidden layers. The first hidden layer composed of 256 neurons with a Rectified Linear Unit (ReLU) activation function. An embedded layer is used between hidden layers to efficiently maps input data from high-dimension to lower dimension space. This allows network to learn data relationship and process it efficiently. The second hidden layer compromise of 128 neurons, the third have 64 neurons and the last one has 32 neurons with ReLU activation function used with them all and a regularization dropout of 20% is added to avoid overfitting. The output layer is activated using Softmax activation function with 14-neurons as the number of features in the dataset.
After defining the generator, we will define the architecture of each discriminator in the proposed model. Each discriminator is a MLP model with different number of hidden layers, different number of neurons and different dropout percentage. The first discriminator composed of 3 hidden layers with 100 neurons for each and 10% dropout regularization. The second have five hidden layers with 64, 128, 256, 512, 1024 neurons for each layer respectively. The dropout percentage is 40%. The last discriminator has 4 hidden layers with 512, 256, 128, 64 neurons for each layer and 20% dropout percentage. The LeakyReLU(alpha=0.2) is used as an activation function for the hidden layers in the discriminators. Two output layers are used for each discriminator with Softmax function as an activation function for one output layer and Sigmoid activation function for the second output layer. The model trained with two loss functions, binary cross entropy for the first output layer, and categorical cross-entropy loss for the second output layer. The output is extracted from each discriminator and are then fed to the last layer in the model where the election is performed to get the best result.
The TDCGAN model can be defined that combines both generator model and the three discriminator models into one large model. This large model will be used to train the weights in the generator model, using the output and error calculated of discriminators. Discriminators are trained separately by taking a real input from the dataset.
The model is then trained for 1000 epochs with batch size of 128. The optimizer is Adam with learning rate equal to 0.0001. The proposed model allows generator to train until it produces a new set of data samples that resembles the real distribution of the original dataset.
Nevertheless, this training strategy frequently fails to function effectively in various application scenarios. This is due to the necessity of preserving the relationships within the feature sets of the generated dataset by the generator, while the dataset used by the discriminator may differ from it. This disparity often leads to instability during the training of the generator.
In numerous instances, the discriminator quickly converges during the initial stages of training, thereby preventing the generator from reaching its optimal state. To tackle this challenge in network intrusion detection tasks, we adopt a modified training strategy where three discriminators with different architecture are used. This approach helps prevent an early emergence of an optimal discriminator, ensuring a more balanced training process between the generator and discriminator.