We carried out a systematic review of the literature to map existing work that explores the use of programmable networks in the context of feature selection applied to IDSs.
3.2. Discussion and Open Issues
The word cloud presented in
Figure 2 summarizes the central theme in 24 review articles that explore specialized approaches to enhance network security through feature extraction. The prominence of keywords such as “software”, “learning”, “detection”, “intrusion”, and “network” highlights the intersection of software-defined methodologies and machine-learning techniques in the quest for robust intrusion detection and prevention systems. The rest of this section discusses the selected articles.
According to Zainudin et al. [
26], in a SDN-based Industrial Internet of Things (IoT) networks, detecting malicious network activities tends to be challenging. This is due to the resource-constrained environment in which Industrial IoT (IIoT) operates and the multitude of tasks that the SDN controller is designed to perform, including network management, monitoring, and load balancing. Zainudin et al. aim to address these issues with effective solutions. It proposes a feature selection mechanism using the Light Gradient Boosting algorithm (LightGBM) to select important features, reducing model complexity and improving intrusion model performance. Additionally, a lightweight CNN-GRU model is introduced to classify cyber-attacks on the previously mentioned networks. Using the public InSDN dataset, the model achieved exceptional results: 98.95% accuracy, 99.00% precision, 98.91% recall, and a time cost of 0.164 ms with the proposed feature selection technique. The article also evaluated the usage of other feature selection techniques, such as Extra Gradient Boosting (XGBoost) and Extremely Randomized Trees (Extra Trees), which showed promising results but with a slightly higher time cost, lower classification accuracy, and precision. This makes LightGBM the most suitable feature selection technique.
AlMasri et al. [
27] propose a combination of Machine Learning (ML) and network programmability to protect networks, specifically against DoS and Port Scanning (Probe) attacks. The proposed ML algorithm was constructed using Anova for feature selection. Four ML models were tested with the features selected by the Anova FS technique, utilizing the NSL-KDD training dataset, comprising
rows and 43 columns. The NSL-KDD test dataset was also employed, consisting of
rows and the same number of columns as the training dataset. Despite four models, the most prominent one was the Naive Bayes model, resulting in an 86.9% accuracy for DoS attack detection and 93.5% for Probe attacks. While the results are promising, as mentioned in the paper’s conclusion, further tests, either in a real environment or virtual machines (VMs), should be conducted to validate their efficiency and accuracy. In summary, the proposed technique shows promise and has be effective in controlled environments and datasets. However, it lacks testing in a real environment, as noted in the paper’s conclusion. Furthermore, performance measurements are needed to demonstrate its capability to handle high dataflow.
Sampath et al. [
28] address the challenge of rapidly advancing attacks on SDN-based networks, proposing an automated solution for rule generation using the Genetic Algorithm (GA) in conjunction with IDSs. This approach aims to predict and prevent the formation of new attacks by automating the rule generation process. The distinctive feature of this proposal lies in its rule generation mechanism, leveraging the GA to create new rules based on data flow behavior. Additionally, a general ruleset serves as a foundation for generating rules that effectively block malicious activities. Inspired by biological concepts like natural selection and evolution, the GA is integrated with a feature selection technique to identify optimal rules and combinations. These enhanced flow rules can preemptively thwart attacks before they reach the data plane in the SDN architecture. The effectiveness of this approach was tested in a controlled environment comprising virtual machines with an SDN-based topology, including a controller, a switch, and two hosts. Results demonstrated the GA’s innovation in generating rules to prevent the emergence of new attacks. While the proposal exhibited success in this controlled setting, further testing in a real-world SDN-based network environment is recommended to validate its performance and consider its potential as a disruptive technology for enhancing the security of SDN-based networks.
In accordance with Roy et al. [
29], the Wireless Sensor Network (WSN) is a high-speed network designed to measure, monitor, and collect data from diverse contexts using a distributed, spatially sparse network of sensors. Addressing the challenge of intrusion detection in such networks, this study proposes a model employing feature selection to reduce the number of extracted features accurately. This minimizes the data volume traversing the network and alleviates the network traffic load, thereby expediting the intrusion detection process. Utilizing a Fully Convolutional Network (FCN) and the UNR-IDD dataset, the model demonstrated exceptional effectiveness and accuracy, achieving accuracy rates between 98% and 99%. Remarkably, the model attained 100% precision, correctly classifying all malicious traffic patterns. Despite the FCN’s suitability for intrusion detection, its performance remains unresolved, rendering it unsuitable for real-time intrusion detection in WSNs. This limitation is particularly pertinent as WSNs typically operate with constrained computational resources and network bandwidth.
Jankowski et al. [
30] introduce the Monitoring and Detection of Malicious Activities in SDN (MADMAS) systems, leveraging native SDN mechanisms and employing data exploration techniques to identify and process features for network traffic classification. The MADMAS system utilizes Independent Component Analysis (ICA) and PCA techniques to reduce the feature space, enhancing the efficiency of SDN traffic classification and notably increasing unauthorized activity detection. These techniques’ significant benefits and impact underscore their pivotal role in successfully reducing the feature space for improved malicious traffic detection.
To address the challenge of detecting intrusion in SDNs, Janabi et al. [
31] propose a technique to enhance the performance of IDSs within SDNs, particularly catering to the demands of large enterprise networks. Their method minimizes the overhead during IDS operations through a two-stage process: optimized feature selection and extraction. In the first stage, a correlation-based feature selection (CFS) algorithm is employed to filter and select the most relevant features from the input data, thereby reducing the computational cost of subsequent analysis. In the second stage, PCA is applied to diminish the selected features’ dimensionality further, resulting in fewer features used in the IDS. This approach accelerates the IDS process and mitigates overhead. However, further validation of the effectiveness of this technique in real-world enterprise network environments, especially in large-scale scenarios, would provide valuable insights into its practical applicability.
Friha et al. [
32] worked on reducing feature dimensionality by selecting highly correlated features that can optimize resource usage and enhance accuracy. Additionally, employing distributed ML approaches, such as Federated Learning, can further improve the solution. The Pearson Correlation Coefficient technique and Chi-Square were chosen for feature selection in both the InSDN and EDGE-IIoTset datasets. Utilizing a convolutional neural network model and selecting the top 19 features, the achieved accuracy reached 96.06% for InSDN and 99.68% for EDGE-IIoTset, respectively, employing PCC for feature selection. This outperformed the results without feature selection, which were 95.22% and 98.32%. Is was also demonstrated a reduction in computation time from 21.05 seconds to 15.91 seconds for the first dataset and 9.70 to 7.10 seconds for the second dataset. While the Federated Learning model slightly decreased accuracy compared to a centralized model, the loss was minimal at most, 0.07%. However, it outperformed existing IDS models in accuracy and resource usage, reducing to 0.068 MFLOPs compared to 0.209 MFLOPs of the second-best model in the resource usage category.
While Industrial IoT networks are recognized as potential targets for attacks, it is essential to acknowledge that home IoT networks are also susceptible to such threats. In this context, the challenge lies in defining the relevant features within a dataset. While feature selection algorithms provide a solution, manual selection is also viable. This involves categorizing features and linking them to specific types of attacks. For instance, categories like Traffic features (time-based or connections to the same host or service) and Packet header features (IP and TCP headers) could be employed for DoS attacks. Manual selection may suffice for simple attacks, but fine-tuning feature selection becomes challenging to maintain accuracy in more complex attacks. The significance of feature selection is underscored, emphasizing that enhancements in Intrusion Detection Systems (IDS) can be achieved by improving feature selection rather than solely depending on the development of new ML models [
11].
While feature selection brings the benefit of reducing computational complexity and has the potential to improve accuracy in various network types, it does not always guarantee an enhancement in accuracy. In a study utilizing a recurrent neural network with gated recurrent units and Long Short-Term Memory (LTSM), combined with ANOVA F-test and Recursive Feature Elimination (RFE), the false alarm rate was reduced to 0.76%. However, this approach resulted in an overall accuracy decrease to 87% compared to other models [
33]. Another employed recurrent neural network (RNN) and LSTM approaches but differed in the feature selection algorithm and datasets. Utilizing the Information Gain (IG) filter method and Random Forest, this solution reduced the features from 48 to only 10. Achieving 98.76% accuracy for Random Forest and 99.5% for IG, this outperformed compared models in the literature mentioned in the paper, where the best accuracy was 96.5% for the CICIDS2017 dataset [
7].
Feature selection can be executed by eliminating features from a dataset based on a correlation-defined rank. Another approach involves starting with zero features and gradually adding features until reaching the defined limit. Forward Feature Selection (FFS) adopts this iterative approach, evaluating each feature’s correlation individually, pairing it with others, and incrementally adding features until the desired number is achieved. In a study employing FFS, out of a pool of 20, the 5 best features were selected to complement a hybrid model combining a CNN and a RNN. The resulting model achieved an impressive 98.09% accuracy with a minimal false positive rate of 0.02% [
34].
One effective technique, known as the Gradient Boosting Feature Selection Module, assesses the importance of features in decision-making within individual decision trees, which are then compared across various trees. The findings from this approach [
35] revealed that more than 75% of the features in the UNSWNB15 dataset lacked significant relevance for accurate classification. By integrating AdaBoost, a method that employs decision trees and stumps to converge towards a robust model, remarkable results were achieved. The study attained an accuracy of 97% within a training duration of less than 43 seconds, surpassing the performance of other methods evaluated.
Scaranti et al. [
36] introduce an IDS based on Artificial Immune Systems (AIS) designed to detect anomalies in SDNs in near real-time. The AIS-IDS comprises three integrated modules within the SDN controller: Flow Collector, AIS Detection, and Mitigation. The Flow Collector acquires and preprocesses IP flows, while the AIS Detection module classifies network behavior as normal or abnormal using AIS. The Mitigation module then responds by creating forwarding rules to block malicious traffic. The proposal leverages key features of IP flows, such as source and destination IP addresses and ports, for precise anomaly detection. A sliding window technique adapts to dynamic SDN changes, ensuring rapid detector generation and improving detection capacity. The IDS identifies anomalies and proactively responds to block attacks, enhancing overall security. Experimental results in an emulated environment demonstrate high efficacy, with an F-measure exceeding 99.9%. Evaluation using a public dataset of attacks further attests to the IDS’s versatility and adaptability, achieving a performance exceeding 92% in detecting various attack types without prior information.
El Houda et al. [
37] present the BoostIDS, a framework designed for the detection and mitigation of security threats in Smart Grid (SG) systems based on SND. The framework uses ensemble learning to address common challenges in intrusion detection systems using ML and Deep Learning (DL). Given the critical role of SDN-based SG in electric power systems and the security challenges it faces, BoostIDS comprises two main modules: one for data monitoring and feature selection, utilizing an efficient boosting-based feature selection algorithm, and another for threat detection based on ensemble learning. Extensive experiments with real datasets (NSL-KDD and UNSW-NB15) showcase BoostIDS’s effectiveness in efficiently detecting and mitigating threats in SDN-based SG systems. Performance metrics, including accuracy, detection rate, F1 score, and training time, demonstrate superior results to other ML/DL-based intrusion detection models. In conclusion, BoostIDS stands out as a prominent framework for enhancing cybersecurity in SDN-based SG systems, overcoming limitations through its ensemble learning approach, as validated by extensive experiments and optimization of training/test complexity.
Ganesan and Sarac [
38] explore security threats in SDN environments, explicitly focusing on evasion-based intrusions. The research emphasizes the susceptibility of ML-based intrusion detection systems to evasion attacks, where adversaries manipulate packet features to avoid detection. The article proposes using multiple sets of reduced features to enhance intrusion detection capabilities in SDN environments instead of relying solely on complete datasets. This approach is grounded in Permutation Feature Importance (PFI), a method that evaluates the relevance of each feature in the effectiveness of ML models. The proposed strategy involves identifying important feature sets, training ML classifiers with these reduced feature sets, and using an ensemble of classifiers to improve NIDS system accuracy. Permutation Feature Importance (PFI) and Orthogonal Feature Ranking (OFR) are employed to identify crucial features in the dataset. Evaluations demonstrate that the hybrid multi-classifier system outperforms conventional classifiers when subjected to adversarial evasion attacks. Permutation Feature Importance involves thoroughly analyzing features to identify essential sets that enable ML classifiers to maintain robust performance in intrusion detection, even with reduced feature sizes. The primary objective is to enhance the resilience of ML classifiers against evasion attacks by diversifying and optimizing feature sets used for model training.
El Houda et al. [
39] introduce a specialized multi-level machine learning framework tailored for advanced attack detection in SDN environments. The framework consists of three key modules: a Data Flow Collection (DFC) module utilizing the sFlow protocol, an Information Gain Feature Selection (IGF) module, and an unsupervised ML module employing Isolation Forest (ML-IF) for anomaly detection. While the exact features are not specified, the evaluation employs the UNSW-NB15 dataset, known for its diverse characteristics related to security threats. The IGF module streamlines the training and testing processes by selecting the most informative features. Based on isolation forests, the ML-IF module effectively identifies and classifies security threats in SDN environments. Experimental results in the OMNeT++ emulator with the UNSW-NB15 dataset showcase the framework’s superiority over recent contributions, achieving a precision of 97% and a detection rate of 96%, while significantly reducing computational complexity. This contribution stands as a promising solution for addressing evolving security threats in SDN, contributing valuable insights to exploring feature selection techniques in SDN Intrusion Detection Systems.
Mbasuva and Zodi [
40] address the vulnerability of SDN to DDos attacks due to their centralized architecture. The proposed solution is an Ensemble Deep Learning-based IDSs tailored for detecting DDoS attacks in SDNs. Utilizing the CIC-IDS2017 dataset, the model combines Convolutional Neural Network (CNN), Deep Neural Network (DNN), and RNN architectures. Key features identified by literature, including Bwd Packet Length, Avg Packet Size, Flow Duration, and Flow IAT Std, are selected for accurate DDoS detection. The ensemble model outperforms individual and ensemble models in the literature, demonstrating notable effectiveness in DDoS attack detection. Future work includes simulating the model on platforms like Mininet and OFNet, implementing mitigation measures for DDoS attacks, and extending detection to application and data layers, indicating a commitment to practical application and continuous improvement in real-world scenarios.
Firdaus et al. [
41] propose addressing SDN vulnerabilities to DDoS by utilizing ML techniques, specifically employing an Ensemble Algorithm. The authors conducted experiments using the InSDN dataset, employing a two-stage methodology. The first stage involved feature selection, normalization, clustering, Ensemble algorithm classification. The second stage validated the detection in SDN using the Mininet emulator, utilizing Ensemble K-means++ and Random Forest algorithms. The approach, centered on Machine Learning and Ensemble Algorithms, aims to enhance SDN security by achieving more efficient detection of DDoS attacks. Using of the InSDN dataset and validation through the Mininet emulator contribute to the method’s robustness and practical applicability.
Kanagaraj et al. [
42] propose the application of deep learning to enhance IDS/IPS functionalities, aiming to reduce human effort in data preprocessing and feature selection. Multiple deep learning models are tested, with the selected model trained on the NSL dataset, a recognized benchmark. The integrated model offers intrusion detection, malware detection, and traffic analysis, contributing significantly to network security by providing a robust and effective defense against evolving threats. Incorporating deep learning strengthens the network’s resilience and enhances its ability to remain secure and resilient against various attacks.
Amarudin et al. [
43] address the challenge of false positive detections in ML-based IDSs. False positives are attributed to harmful ML techniques, prompting the proposal of the S-SDN model. S-SDN, an Ensemble Learning (EL) model, is constructed through the stacking technique of three base learners (SVM, Decision Tree, and Naïve Bayes). It serves as a classifier in IDS for intrusion detection and is validated with the UNSW-NB15 dataset. Experimental results show that S-SDN outperforms the previous method based on a single classifier. S-SDN achieves an accuracy of 83.19%, surpassing the SVM with 75.89% accuracy and the ensemble classifier (Bagging-DT) with 80.09% accuracy. Despite promising results, the research emphasizes the ongoing need for improvements in EL-based IDS development, proposing an EL model with resource selection techniques and diverse base learners. Continuous advancements are deemed essential in this domain.
Abdulqadder et al. [
44] focus on implementing key technologies like SDN and Network Function Virtualization (NFV) to support advanced 5G networks. Due to the challenges in security provisioning for the large number of users in 5G networks, an advanced attack-aware security provisioning scheme is proposed. The scheme involves the Initial Authentication Process, Packet Classification, and Switch Migration Process. Initial authentication employs a Secure Identity (SIA)-based scheme at the access point. Suspicious packets are identified and classified into Virtual Network Function (VNF) by the controller, using a Genetic Algorithm with Correlation (GAC) based feature selection algorithm leading to a Radial Basis Function with Extreme Learning Machine (RBF-ELM) classifier. Malicious packets are dropped in the VNF, and normal packets are redirected to the destination through the controller. To mitigate overload attacks on the flow table, an Enhanced Artificial Bee Colony (EABC) algorithm is introduced in the controller. Experimental results demonstrate superior performance regarding delay, redirected packets, detection accuracy, packet transmission rate, and packet loss rate for the proposed scheme.
Suresh et al. [
45] present an IDS for software-defined IoT networks based on artificial intelligence, incorporating the self-adaptive energy-efficient BAT algorithm. The methodology involves three stages: preprocessing, feature extraction and selection using the information gain algorithm, and classification using SVM. The self-adaptive energy-efficient BAT algorithm is designed to optimize feature selection through a fitness-based parallel task processing strategy, improving scalability and energy efficiency. The study uses the NSL KDD CUP 1999 dataset for evaluation, considering parameters like accuracy, precision, recall, and additional time. The proposed algorithm enhances feature selection by dynamically adapting to changing environmental conditions, particularly in high-traffic environments. While highlighting improvements over conventional BAT swarms, the study identifies limitations in energy efficiency and suggests areas for future enhancement, such as error-handling procedures and process revocation functionalities. Overall, the work contributes to advancing intrusion detection systems, artificial intelligence, and security in software-defined IoT networks.
Govindaraju et al. [
46] address the increasing security concerns related to the widespread adoption of IoT services, particularly the threat of DDoS attacks targeting IoT devices. The study advocates using SDN as a secure management solution for these devices. The focus is on efficiently detecting DDoS attacks in SDN by employing optimized models based on deep learning. The proposal involves collecting normal and DDoS traffic characteristics from SDN datasets. The NSL-KDD dataset is recommended for feature selection to simplify models for readability and reduce training time. The research proposes a real-time DDoS attack detection system in SDN using a LSTM model. Applying an artificial gorilla troop optimizer for feature selection from the NSL-KDD dataset results in high classification accuracy. Their IDS achieves a notable detection accuracy of 97.59%, showcasing its effectiveness in reducing processing loads and execution times.
Ahn et al. [
47] address the challenge of explicability in deep learning models used in traffic classification, particularly in network functions like SDN and network intrusion detection systems. While various methods have been proposed for classifying encrypted traffic without inspecting packet payloads, the lack of explainability in these models raises concerns, especially when dealing with malicious or incorrect data in the training set. To tackle this issue, the paper proposes an explainable artificial intelligence (XAI)-based method utilizing a genetic algorithm. The proposed method aims to elucidate how the deep learning-based traffic classifier functions, providing quantifiable importance measures for each feature. Additionally, a GA generates a feature selection mask that highlights the most significant features across the entire dataset. The practical implementation of this approach resulted in a deep learning-based traffic classifier with an accuracy of approximately 97.24%. These results indicate that the GA-based XAI method holds promise in offering valuable insights to enhance the understanding and reliability of traffic classification models in intricate network environments.
The mentioned studies highlight the significant role of feature selection as a valuable tool in enhancing efficiency and accuracy, emphasizing the need for careful consideration in evaluating attack detection. While leveraging established techniques from the literature is a prudent strategy, exploring novel feature selection methods remains crucial. In this context, the following section shows a comprehensive comparison of techniques for selecting attributes in datasets.