This chapter outlines the process for implementing the Framework for Intrusion Detection in Wireless Networks using machine learning techniques.
Wireless technologies have increased rapidly in recent years. While serious efforts have been made to secure these tech- nologies, most security measures have proven inadequate in practice. The AWID project aims to provide a solid basis for researchers to develop robust security mechanisms for current and future generations of wireless networks by providing tools, methodologies, and datasets, as the previous datasets weren’t specific to wireless networks.
AWID dataset was extracted in 2016 then it has been developed into a new version in 2021 called AWID3, the main difference between the old version and the new one can be summarized as follow:
A. Structure of AWID3 Dataset
The AWID3 dataset has been carefully curated to record and examine the traces of different assaults within the IEEE 802.1X Extensible Authentication Protocol (EAP) environment. It is valuable and publicly available. It is significant for being the first dataset to offer a review of the IEEE 802.11w standard, which is necessary for hardware to be approved for use with the WPA3 protocol. The AWID dataset, from which AWID3 was built, has 254 features, of which 253 are general features and one is used for labeling. The dataset is offered in CSV format for simple access and interoperability with many different data analysis tools and methodologies. A thorough understanding of network activity and attack patterns is made possible by the extracted features, which cover both the MAC (Media Access Control) layer and the application layer. The dataset consists of (36,913,503) instances (30,387,099) normal traffic, and (6,526,404) malicious ones. The malicious traffic includes 13 types of attacks;
- -
Deauthentication Attack.
- -
Disassociation Attack.
- -
Re-association Attack.
- -
Rogue AP Attack.
- -
Krack Attack.
- -
Kr00k Attack.
- -
SSH Brute Force Attack.
- -
Botnet Attack.
- -
Malware.
- -
SSDP Amplification.
- -
SQL Injection Attack.
- -
Evil Twin.
- -
Website spoofing.
In our research, we used two types of these attacks which are krack and kr00k, since these two types are the most recent type of attacks discovered in IEEE 802.11.
- 1)
-
Krack Attack: The Krack attack has been noted as a potential security risk to the current encryption techniques used to preserve and protect Wi-Fi networks for the past 15 years. Publicly available information on the Krack attack includes information about the attack itself. There is no guarantee that every device will have a patch and be protected from these attacks coming from any networked point [
37,
38]. The four-way handshake procedure, which is a crucial part of the IEEE
802.11 protocol, has a serious weakness that allows any attacker to decode a user’s communication without eavesdropping on the handshake or knowing the encryption key, according to In [
39] study. This flaw results from the Pairwise Transient Key (PTK) installation process’ use of a particular message counter. It is vital to look at how keystreams are used in the encryption process in order to comprehend the decryption process. The plaintext and keystream are merged using the XOR (exclusive OR) technique to create the encrypted message that is sent from the client to the Access Point (AP). The PTK, which is derived using the AES (Advanced Encryption Standard), is scrambled with a number of other factors to create the keystream. The vulnerability, though, only exists in the XOR operation’s last phase. The logic flow of this step is connected to a fundamental mathematical feature that is exploited by the KRACK vulnerability. Equation (1) shows how the plaintext (P) and keystream (KS), as shown in the paper, are combined to create the ciphertext (E). The KRACK hack uses this defect in the XOR method to decrypt the encrypted communications, putting the security of wireless networks using the IEEE 802.11 standard at risk.
An attacker could use two captured encrypted packets to decrypt them. Since the keystreams are identical, XORing the two ciphertexts results in the keystreams being canceled and leaving two plaintexts.
If the attacker were to accurately estimate or know P1, they could decrypt P2. The well-known first message that the AP or client sends after connecting can be used for this. The key WPA2 stream was designed to stop this exploitation, but KRACK researchers have found a way around it. Most of the keystream is made up of the static variables PTK, GTK, flags, MAC addresses, and counters. The only variable that alters when communications are encrypted is the packet number. Because every encrypted communication will have a different packet number and unique keystream, XOR cancellation is not conceivable [
40].
- 2)
Kr00k Attack: Some WiFi traffic that has been encrypted with WPA2 can be decrypted by a vulnerability called Kr00k. Security company ESET discovered the vulnerability in 2019. According to ESET, this loophole affects more than a billion devices. Devices with Wi-Fi chips that have not yet received a patch from Broadcom or Cypress are vulnerable to Kr00k. The majority of modern Wi-Fi-enabled devices, including smartphones, tablets, laptops, and Internet of Things (IoT) devices, use these Wi-Fi chips [
41].
Table 1 highlighted the main difference between Krack and Kr00k attacks.
B. Preprocessing steps
As we mentioned before the dataset consists of (36,913,503) instances (30,387,099) normal traffic, and (6,526,404) malicious ones.
(49,990) instances for the Krack attack while (186,173) instances for the kr00k attack.
We will implement our experiment in two phases; the first phase consists of two classes (Krack, normal) and (kr00k, normal). While the second phase consists of muli-class(Krack, kr00k, and normal).
To highlight the importance of the preprocessing of the dataset before using it in the proposed model, we have used the chosen sample without any preprocessing and feature selection techniques. The first sample consists of (106971 kr00k traffic and 106791 normal one), while the second sample consists of (33180 Krack traffic) and (34000 normal one), with 254 features for both samples, as Shawn in
Table 2.
We chose the following machine learning algorithms:
- 1)
-
Decision tree: the process for building a decision tree and the most commonly used criteria for splitting the data [
42]:
- -
Calculate an impurity measure for the entire dataset (e.g., Gini impurity or entropy).
- -
For each feature, calculate the impurity measure of splitting the data based on the values of that feature.
- -
Choose the feature that produces the lowest impurity measure after splitting the data.
- -
Split the data based on the chosen feature and repeat the process for each resulting subset of data until a stopping criterion is met (e.g., a maximum depth is reached or the number of samples in a leaf node is below a certain threshold).
The equations for calculating impurity measures depend on the specific criterion being used. For example, the Gini impurity measure for a set of samples
S with
C classes is:
where
pi is the proportion of samples in
S that belong to class
i. The entropy impurity measure for the same set of samples
S is:
where
pi is the same as above.
These impurity measures are used to evaluate the quality of each split and to choose the feature that produces the lowest impurity measure.
- 2)
Ensemble classifiers: combine multiple individual classifiers into a single ensemble classifier to improve the overall predictive performance. There are different types of ensemble classifiers, such as bagging, boosting, and stacking, and the equations used for each type can vary.
- 3)
SVM: Support Vector Machine (SVM) is a popular machine learning algorithm for classification, regression, and outlier detection. The main idea behind SVM is to find a hyperplane that separates the data into different classes with the largest margin possible. The equations used in SVM [
43]:
where
f (
x) is the predicted class label,
αi is the Lagrange multiplier for the
i-th training sample,
yi is the class label of the
i-th training sample (either +1 or -1),
K(
x, xi) is the kernel function that maps the input features
x and
xi to a higher-dimensional space, and
b is the bias term.
where
w is the weight vector of the hyperplane.
where
W (
α) is the objective function to be maximized,
C is a user-defined parameter that controls the trade-off between the margin and the number of training errors, and the constraints ensure that the Lagrange multipliers are non-negative and sum up to zero.
- 4)
Kernal: A kernel function is a function that maps the input data into a higher-dimensional space, where it is easier to find a separating hyperplane. The equation of linear Kernal [
44]:
where
xi and
xj are the input features of the
i-th and
j-th training samples, respectively.
- 5)
KNN: K-Nearest Neighbors (KNN) is a simple yet effective machine learning algorithm used for classification and regression tasks. The basic idea behind KNN is to find the K nearest training samples to a given test sample based on a distance metric, and then use the labels of the K nearest neighbors to predict the label of the test sample. The equation of KNN can be represented as follow [
42]:
where
p is the number of features in each sample.
- 6)
-
Neural Network: Neural Networks are a powerful class of machine learning algorithms that are inspired by the structure and function of the human brain. A neural network consists of multiple layers of interconnected processing units called neurons, and the input data is processed through the network in a forward pass, with the output of each layer serving as the input to the next layer. Some equations used in neural networks [
45]:
where
z is the input to the neuron. The ReLU function is given by:
where
z is the input to the neuron.
where
yj is the output of the
j-th neuron in the layer,
xi is the
i-th input to the layer,
wij is the weight of the connection between the
i-th input and the
j-th neuron,
bj is the bias term of the
j-th neuron, and
f is the activation function.
where
E is the error function,
α is the
learning rate, and
and
are the partial derivatives of the error
concerning the weights and biases, respectively.
After applying different ML algorithms, using the cross-validation K=10, the accuracy results were very low, which is to be expected. The results are presented in
Table 3
The accuracy results proved the importance of preprocessing steps since it’s a constructive and essential step for obtaining the correct data required to build a classifier, as shown in several types of research such as [
46,
47,
48]. Data preprocessing, which aims to convert the raw data into a simpler and more efficient format for subsequent processing steps, is a crucial step in the knowledge discovery process because quality decisions must be based on quality data. Thus, the preprocessing procedures were carried out on the AWID3 dataset. The AWID3 consists of 13 CSV files with (36,913,503) instances, (30,387,099) normal traffic, and (6,526,404) malicious ones. That was studied and well understood.
-
1)
-
Detecting Krack attack: According to the importance of preprocessing step as we mentioned before, the preprocessing procedure for the Krack dataset sample was as follows:
- 1-
Deleting the constant and empty features.
- 2-
Ignoring features that have more than 60% missing values.
- 3-
Replace missing values with NaN.
The remaining dataset consists of 67 features and (67180 instances) (33180 Krack traffic and (34000 Normal traffic).
After preprocessing the data, we applied different ML algorithms.
Table 4 shows The performance of the learning algorithms after preprocessing.
For the same previous sample, we have used feature selection techniques to reduce the computing time and enhance the accuracy of the detection model. We chose the ANOVA FS technique, as Shawn in
Figure 3 which is a widely used statistical approach for comparing different independent means.
In this equation, Fi represents the ANOVA F-value for the i-th feature, MSB is the between-group mean square, MSW is the within-group mean square, ni is the number of samples in group i, k is the total number of groups, N is the total number of samples, ij is the mean of the j-th sample in group i, is the overall mean, and xijl is the l-th feature value of the j-th
sample in group i.
The features ranked in the ANOVA method by calculating the variance ratio between and within groups [
49]. The accuracy results after applying ANOVA feature selection (FS=15), are shown in
Table 5.
The results proved the necessity of processing the dataset since an efficient result depends on efficient data, furthermore, the results show improvement in accuracy results when we used feature selection techniques, as
Figure 4 shows.
The best accuracy results that we got were from the Ensemble classifier with 99.1% in addition to 1.8% False Negative Rate, followed by Naive Bayes with 95% accuracy result and 2.3% False Negative Rate.
-
2)
-
Detecting Kr00k attack: This sample consists of 235,064 instances; (106971 kr00k traffic and 128093 normal ones), the preprocessing procedure for the Krack dataset sample was as follows:
- 1-
Deleting the constant and empty features.
- 2-
Ignoring features with more than 60% missing values, the remaining features are 63.
- 3-
Replace missing values with NaN.
The remaining dataset consists of 63 features and 235,064 instances.
After preprocessing the data, we applied different ML algorithms.
Table 6 shows The performance of the learning algorithms after preprocessing.
For the same previous sample, we have used ANOVA feature selection techniques to reduce the computing time and enhance the accuracy of the detection model. As Shawn in
Figure 5 which is a widely used statistical approach for comparing different independent means.
The accuracy results after applying ANOVA feature selection (FS=15), are shown in
Table 7. After applying the Ml algorithms on the chosen sample three times; without any process for the dataset, with preprocess and with FS, we can realize that the preprocess step is a critical and essential step in data mining to get accurate results. Especially in dealing with such data which suffer from high dimensionality imbalance and overfitting of the data. The accuracy results for the mentioned steps are presented in
Figure 6. The best accuracy that we got was for Neural Network and SVM with 96.7%. We can conclude from
Figure 6 how the accuracy affects by FS and prepossessing the dataset before applying any ML algorithms on it.
Multi Class Detection: In this phase, we will use a sample consisting of three classes(Krack, Kr00k, and Nominal), 15,000 instances, and 254 features. Due to the importance of preprocessing as we noted in the previous subsections, we have applied the preprocessing steps in the chosen sample. Where we removed the empty features and the features with constant values, in addition to replacing all the empty cells in the remaining features with NaN. Then we applied ML algorithms using the classification linear application on MATLAB. The performance of the ML algorithms is presented in
Table 8, the table presented the accuracy results using FS with NOVA algorithm techniques and without using FS.
The accuracy results for the mentioned steps are presented in
Figure 7.
The best accuracy that we achieved without using FS was for KNN 67.4%, while when we applied the ANOVA FS the performance increased in all the used algorithms. The best accuracy was achieved by the Ensemble classifier with 90.7%, and for the Decision tree with 88.3%.