To address cyber-attacks, every organisation or system follows some basic security measures. User authentications, Firewall [
40], Anti-virus, data encryption and Cryptography are the basic security measure implemented, but these security systems are not strong enough to address the present cyber-attacks that are rapidly evolving and innovative. Intrusion detection systems (IDS) and Intrusion prevention systems are used in addition to the basic security measures. IDS monitors the information flow in the network continuously and detects the attack packets [
41,
42]. IDS is classified into network intrusion detection system (NIDS) and host intrusion detection system (HIDS). NIDS is a software-defined system it monitors, captures and analyses network traffic. It detects malicious data packets by comparing them with the already known attack patterns. But the operation of NIDS is very difficult in busy and complex networks. HIDS is a host-based system installed on individual devices; it monitors the information received on the particular device and generates alerts for any malicious packets found. Depending on the operation, IDS is classified into signature-based IDS [
43], and anomaly-based IDS [
44].
4.1. Artificial Intelligence for Cyber Security
AI’s broad scope and capabilities made it possible to penetrate various fields. Cyber security is also enhanced with the application of AI into it [
46]. There are different levels of algorithms applied in cyber security, and with the increase in complexities of real-world systems, AI has also evolved.
Initially, basic machine learning algorithms, also called shallow models, are used for cyber security, later deep learning techniques are introduced that are capable of dealing with complex networks, and further reinforcement learning methods are proposed that are futuristic and claimed to be self-learning methods.
Figure 9 gives the classification of various ML models used for cyber security.
Machine learning models, referred to as shallow models, are further classified into supervised learning and unsupervised learning based on their learning procedure. In unsupervised learning, the classified outputs are formed into clusters; these algorithms mostly depend on the internal pattern of the data. The k-means algorithm is used to detect malicious entries into the network [
47]; the k-means algorithm groups the unlabelled data into clusters. The value of K indicates the no.of clusters. This technique divides the data into different groups, which gives insights for data analysis about unknown and known attack patterns. Sequential pattern mining [
48], a subset of data mining, is also a data analytic method that gives the knowledge of the attack patterns; this method will send an alert if any malicious activity or abnormal activity is registered. Another data mining method used to detect web intrusion is the apriori algorithm [
49]; the apriori method that runs on the specific rule set will keep track of frequently occurring data patterns and indicate if any new pattern is detected.
Supervised learning methods are already specified with the class labels to verify model classification or predictions. The k-nearest neighbours (KNN) method is used to classify the incoming entry as normal or malicious entry [
50]. Naïve Bayes is a statistical method that uses a probabilistic method based on the Bayesian theory; the probability of a field prone to attack can be calculated [
51]. Support vector machine (SVM) is a classification method that separates the intrusions and normal entries from the dataset. SVM uses a kernel that facilitates the classification of even complex and nonlinear data; SVMs can transform the data into the next dimension if the decision boundary cannot be determined in this dimension [
52]. Decision trees and random forests are tree-based classifiers [
53]. Based on the training data, a tree-like structure is created in a decision tree, predictions can be made based on the tree’s structure, and any unknown entities can be sorted out [
54,
55]. The random forest also follows a similar method, but instead of a single tree, a large group of trees are created, and the final structure of the tree for classification is decided by voting process[
56,
57,
58].
Deep learning (DL) models are designed to handle complex and non-linear systems; DL models are considered superior to ML models in system handling capability. The architecture of DL models also differs from ML models; there is no fixed algorithm for this model [
59]. DL model consists of neurons placed in different layers; the working of neurons in the DL model is inspired by the working of the human brain, and neurons of each layer are interconnected. Information transmits from the input layer to the output through multiple hidden layers. DL model consists of two stages, the training state and the testing stage. The training stage consists of the modification of weights for each connection during multiple iterations; this process makes the DL model learn the patterns of the data feeding to the network.
Later the efficiency of the trained model is tested on the testing data. Deep neural networks (DNN) have the structure discussed above with multiple hidden layers, an increase in the depth of the network gives the model the ability to classify the nonlinear data [
60,
61]. Convolution neural networks (CNN) is widely used for image classification, the data to be classified is converted into image format, and the malicious data is identified [
62,
63]. Recurrent neural networks (RNN) are used for time series data; this network model predicts the occurrence of the next data sample based on the previous output and the present inputs [
64]. But this model suffers from memory issues; often the outliers and extreme cases are considered as the attack vectors.
To overcome this, the models like Long-Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) are introduced that contain the memory element, and the network architecture also differs from the classical RNN [
65]. Generative adversarial networks (GaN) and autoencoders are unsupervised techniques in deep learning where the outputs are not specified. The GaN model consists of two networks, namely the generator and discriminator. The generator takes the input data sample and generates a sample of data; the generated sample is compared with the training data or real sample using a discriminator. Discriminator, after comparison, decides whether the incoming data sample is real or fake [
66,
67]. Autoencoders is a neural network architecture, and this technique often uses for video and image classification [
68]. The input data is compressed to the lower dimension called as latent space; the latent space consists of data containing the most prominent features. From the latent space, the auto-encoder tries to recreate the input data at the output; by comparing the output of the autoencoder, normal and fake data are classified. During the training phase, autoencoders are trained to recreate the input near the output; higher variation in the output and input indicates the attacked data.
Reinforcement learning is the advanced and futuristic architecture proposed to practice self-learning [
69]. RL, also known as reward-based learning, works on the reward obtained by the action it performed in the previous iteration. The agent is present in a customized environment with predefined rules, goals and reward criteria. The model reaching the goal with high reward points is considered the optimized model; RL model continuously updates its decision-making or policy based on the rewards.
Popular real-time datasets like KDD99 and DARPA are considered to evaluate the deep learning and machine learning algorithms’ performance. Initially, machine learning algorithms are implemented on the KDD99 dataset and the performance obtained is as follows naïve bayes with 97% accuracy [
70], SVM with 93% accuracy [
71], Decision tree with 94.3% accuracy [
72], Random forest with 99% accuracy [
73] and Deep belief networks with 96.5%[
74]. Further, the same KDD99 dataset is classified using deep learning models, and performance is as follows, GRU with 98.64%[
75], CNN-LSTM with 99.7% [
76,
77,
78].