PreprintArticleVersion 1Preserved in Portico This version is not peer-reviewed
Optimizing Intrusion Detection Systems: Exploring the Impact of Feature Selection, Normalization and Three-Phase Precision on the Cse-Cic-Ids-2018 Dataset
Songma, S.; Sathuphan, T.; Pamutha, T. Optimizing Intrusion Detection Systems in Three Phases on the CSE-CIC-IDS-2018 Dataset. Computers2023, 12, 245.
Songma, S.; Sathuphan, T.; Pamutha, T. Optimizing Intrusion Detection Systems in Three Phases on the CSE-CIC-IDS-2018 Dataset. Computers 2023, 12, 245.
Songma, S.; Sathuphan, T.; Pamutha, T. Optimizing Intrusion Detection Systems in Three Phases on the CSE-CIC-IDS-2018 Dataset. Computers2023, 12, 245.
Songma, S.; Sathuphan, T.; Pamutha, T. Optimizing Intrusion Detection Systems in Three Phases on the CSE-CIC-IDS-2018 Dataset. Computers 2023, 12, 245.
Abstract
In this paper, intrusion detection systems are thoroughly investigated utilizing the CSE-CIC-IDS-2018 dataset. The research is divided into three key phases: first, applying Data Cleaning, Exploratory Data Analysis, and Data Normalization techniques (min-max and z-score) for preparing data across distinct classifiers. Second, feature importance is reduced using a combination of Principal Component Analysis (PCA) and Random Forest (RF), with the goal of improving processing speed and decreasing model complexity. This stage comprises a comparison with the entire dataset. Finally, machine learning algorithms (XGBoost, CART, DT, KNN, MLP, RF, LR, and Bayes) are applied to specific features and preprocessing approaches. Surprisingly, the XGBoost, DT, and RF models outperform in both ROC values and CPU runtime. Following evaluation, which includes PCA and RF feature selection, an optimal set is produced.
Keywords
intrusion detection system; machine learning techniques; Exploratory Data Analysis; Performance Evaluation; feature selection; CSE-CIC-IDS-2018 dataset; Three phase models
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.