N-STGAT：Spatio-Temporal Graph Neural Network Based Network Intrusion Detection for Near-earth Remote Sensing

With the rapid development of Internet of Things (IoT)-based near-earth remote sensing technology, the problem of network intrusion for near-earth remote sensing systems has become more complex and large-scale. Therefore, it is essential to seek an intelligent, automated, and robust network intrusion detection method. In recent years, network intrusion detection methods based on graph neural networks (GNNs) have been proposed. However, there are still some practical issues with these methods. For example, they have not taken into consideration the characteristics of near-earth remote sensing systems, the state of the nodes, and the temporal features. Therefore, this article analyzes the characteristics of existing near-earth remote sensing systems and proposes a spatio-temporal graph attention network (N-STGAT) that considers the state of nodes. The proposed network applies spatiotemporal graph neural networks to the network intrusion detection of near-earth remote sensing systems and validates the effectiveness of the proposed method on the latest flow-based dataset.

Keywords:

Subject: Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

Remote sensing technology has undergone rapid development in recent years. High-resolution satellite remote sensing technology has provided convenience for meteorology, terrain surveying, military, agriculture, and other fields [1,2]. However, satellite remote sensing has limitations in some remote sensing fields that require ultra-high precision and multimodal information due to distance and equipment constraints [3]. To address these issues, the development of Internet of Things (IoT)-based near-earth remote sensing technology has made significant progress. Near-earth remote sensing technology plays a crucial role in agriculture analysis, mining, water monitoring [4], and other fields. The near-earth remote sensing technology consists of a vertical structure as shown in Figure 1, which includes a large number of IoT devices [5], such as weather balloons, airplanes, drones, and sensors. These devices are connected to the base station via an IoT network and collect various remote sensing information, providing hardware support for subsequent data analysis. Figure 2 shows the deployment of near-earth remote sensing devices for monitoring crops in farmland.

The deployment of a large number of Internet of Things (IoT) devices is required for the near-ground remote sensing system, and most of these devices are outdoors, exposed to the elements, and unsupervised, which makes them vulnerable to physical or network-based intrusions. Once a remote sensing device is compromised, it may send incorrect data or attack other devices, causing problems for the near-earth remote sensing system. Therefore, it is essential to deploy a network intrusion detection system specifically designed for the near-ground remote sensing system.

A network intrusion detection system is a system that can quickly respond when an IoT device is invaded, and its core is a network intrusion detection method. There are mainly two types of network intrusion detection methods: feature-based detection and machine learning-based detection [6]. Feature-based detection methods require a pre-set of attack features, and the captured data packets are compared with these features. The accuracy of this detection method depends entirely on the set of features, and this method cannot respond well to new attacks. With the development and application of machine learning in recent years, machine learning-based detection methods have been developed. However, these methods directly use data flow information for identification [7-11]. Although these methods can effectively solve the problem of feature dependence and have good detection effects on new attacks, the identification accuracy is not high, and they are difficult to apply to practical networks. Recently, researchers have studied the newer subfield of machine learning, graph neural networks (GNNs), and proposed intrusion detection methods based on GNNs [6,12,13]. However, these methods have low identification accuracy in multi-classification tasks and cannot be effectively applied to near-Earth remote sensing systems.

Due to the fact that meteorological balloons, aircraft, drones, and other devices have their own operating systems, the near-Earth remote sensing system composed of these devices can provide more information. Therefore, this article considers the characteristics of the near-Earth remote sensing system and proposes a spatio-temporal graph attention network (N-STGAT) that considers the node status, applying spatio-temporal graph neural networks [14] to network intrusion detection in the near-Earth remote sensing system. To the best of our knowledge, this is the first time that spatio-temporal graph neural networks have been applied to network intrusion detection in the near-Earth remote sensing system.

The contributions of this article are as follows:

Expanding the latest IoT network intrusion detection dataset by incorporating the status of the node, which better reflects the situation of the near-Earth remote sensing system and enables better evaluation of the proposed method.

The article proposes for the first time the application of spatio-temporal graph neural networks to network intrusion detection in near-Earth remote sensing systems, and further improves the method to better align with the characteristics of near-Earth remote sensing systems, providing a new solution for network intrusion detection.

The proposed method is validated on the extended dataset and compared with some of the recent effective IoT network intrusion detection methods. The results show that the proposed method outperforms other methods.

The other chapters of this article are as follows: Chapter 2 introduces the research contents of various scholars in the field of network intrusion detection methods. Chapter 3 first briefly introduces some information about GNN and LSTM, and then explains how to extend the latest network intrusion detection dataset. Chapter 4 provides a detailed explanation of the proposed network intrusion detection system. Chapter 5 describes the experimental methods and results. Chapter 6 concludes the article.

2. Related Work

In recent years, many researchers have developed machine learning-based methods for network intrusion detection. However, most of these methods directly use data flow information for identification.

Casas et al. [15]. proposed an unsupervised network intrusion detection system called UNIDS, which employs a subspace clustering-based and multi-evidence accumulation-based unsupervised outlier detection method to detect unknown network attacks. It does not require the use of any features, labeled traffic, or training to detect various types of network attacks, such as DoS/DDoS, probing attacks, worm propagation, buffer overflow, illegal access to network resources, etc. The proposed method's effectiveness was demonstrated through experiments using KDD99 and real traffic data from two operational networks. However, the KDD99 dataset is quite old and may not reflect the attack characteristics present in modern IoT networks.

In [16], authors proposed a hybrid anomaly mitigation framework for IoT using fog computing to ensure faster and accurate anomaly detection. The framework employs signature- and anomaly-based detection methodologies for its two modules, respectively. The BoT-IoT dataset was used for experimentation and the effectiveness of the proposed method was ultimately validated. Results showed accuracy of 99% in binary and multi-class classification problems, with at least 97% average recall, average precision, and average F1 score.

The two methods mentioned above use traditional machine learning techniques for network intrusion detection, such as decision trees, MDAE, LSTM, random forests, and XGBoost. These methods directly train on the dataset without considering the graph topology information, resulting in each training data being relatively singular and unable to fully explore the information in each data. Therefore, these methods have limited capability in detecting complex network attacks, such as Botnet attacks [17], distributed port scans [18] or DNS Amplification attacks [19], which require a more global network view and traffic.

Leichtnam et al. [20]. introduced a unique graph representation called security objects' graphs that linked events of different kinds and allowed for a rich description of the analyzed activities. They proposed an unsupervised learning approach based on auto-encoders to detect anomalies in these graphs. The authors hypothesized that auto-encoders could build a relevant model of the normal situation based on the rich vision provided by security objects' graphs. To validate their approach, they applied it to the CICIDS2017 dataset and showed that their unsupervised method performed as well as, or even better than, many supervised approaches.

Due to the good performance of GNN on graph-structured data and the fact that a network is a natural graph, in recent years, some scholars have applied GNN to intrusion detection systems.

Wai Weng Lo et al. [6]. proposed the E-GraphSAGE architecture based on GraphSAGE, which allows capturing the edge features and topology information of the graph for network intrusion detection in the Internet of Things. Experiments on four latest NIDS benchmark datasets show that the method has achieved very good effects on binary classification problems, but not on multi-classification problems. When constructing the graph, this method directly uses the ip address and port of the network flow as nodes and the network flow as edges. When this method uses GraphSAGE for neighborhood aggregation, it may cause some neighborhoods to be aggregated multiple times and the whole neighborhood to be aggregated, which will affect the identification accuracy.

Cheng Q et al. [12]. proposed an Alert-GCN framework that uses graph convolutional networks (GCN) to associate alerts coming from the same attack for the prediction of the alert system. Then adopted the DARPA99 data set for experiments, achieving good results in the experiment. However, this method used a custom similarity method to define the edges in the graph when constructing an alarm-related graph, which will cause the neighbors of any node in the graph to be highly similar to themselves. This article recognizes that the graph constructed by this method will cause overfitting in the graph neural network.

Evan Caville et al. [13]. proposed an intrusion and anomaly detection method called Anomal-E based on E-GraphSAGE, which utilized edge features and graph topology in a self-supervised process. Then, they validated the proposed method using the NF-UNSW-NB15-v2 and NF-CSE-CIC-IDS2018-v2 datasets. Ultimately, they achieved results with at least 97.5% accuracy and at least 92% F1-Score. Additionally, the article was the first to apply a self-supervised graph neural network solution to network intrusion detection.

Although the above method applies graph neural networks to network intrusion detection, it does not take into considering the characteristics of the remote sensing system, node status, and temporal information. Therefore, this article combines GAT and LSTM to better utilize the spatial information of the graph and the temporal information of the data flow, increasing the reliability of network intrusion detection.

3. Background

This article argues that when identifying data flows, the state of the nodes should be considered simultaneously. When a network attack occurs, the state of the attacking node is significantly different from that of a normal node. Therefore, considering the state of the nodes is very beneficial for identifying attacks. In addition, identifying attacks should also consider the topological relationship between data flows and the temporal information of data flows.

Intrusion detection systems are generally deployed at the entry point in order to detect network intrusions. However, with the development of the IoT, more and more networks are structured as shown in Figure 3, which this article refers to as an "IoT domain". Such IoT domains are increasingly prevalent in industrial IoT systems, such as those used by companies like Apple, Huawei, and Xiaomi. Most of the devices in an IoT domain are monitored by a central server, which can obtain information about the status of the nodes. This situation is similar to that of a ground-based remote sensing system based on the IoT. As the situation where certain nodes within a domain are controlled to attack other nodes becomes more and more common, intrusion detection systems need to be deployed at the exit points of the domain to detect attacks on nodes.

3.1. GNN and LSTM

3.1.1. GNN

GNNs are a new type of neural network specifically designed to handle the properties of graphs. In recent years, GNNs have been widely applied in graph processing [21], networking [22], intelligent transportation [23], recommendation systems [24], and distributed computing [25], among other fields. A key feature of GNNs is that they can process non-Euclidean data and utilize topological structure data through message passing. When identifying nodes in a graph, GNNs can aggregate the data features of neighboring nodes, allowing them to influence each other [26]. This process is called embeddings [27]. There are many different types of GNNs, among which Graph Attention Network (GAT) [28], Graph Convolutional Network (GCN) [29], and GraphSAGE [30] are particularly popular among researchers.

GAT (Graph Attention Networks) is a type of graph neural network that incorporates attention mechanism to aggregate neighborhood nodes by computing attention coefficients. This enables GAT to capture the influence of different nodes within the neighborhood. GAT has two core components: computing the attention coefficients

α_{i j}

and the hidden layer features

{\vec{v}}_{i}^{'}

of nodes.

GAT is a type of graph neural network that incorporates attention mechanism to aggregate neighborhood nodes by computing attention coefficients. This enables GAT to capture the influence of different nodes within the neighborhood. GAT has two core components: computing the attention coefficients

α_{i j}

and the hidden layer features

{\vec{v}}_{i}^{'}

of nodes.

The calculation process for the attention coefficients

α_{i j}

between neighboring nodes is shown in Figure 4(a). The original GAT article mentions two types of neighborhoods: node neighbors (nodes directly connected to the current node) and full neighborhood (all nodes). In this article, we use node neighbors as the neighborhood to prevent the aggregation of a large amount of neighborhood node information that may cause the features to become blurry. Using node neighbors can also reduce time and space complexity. The calculation formula is:

α_{i j} = \frac{e x p (L e a k y R e L U ({\vec{a}}^{T} [W {\vec{v}}_{i} | | W {\vec{v}}_{j}]))}{\sum_{k \in N_{i}} e x p (L e a k y R e L U ({\vec{a}}^{T} [W {\vec{v}}_{i} | | W {\vec{v}}_{k}]))}

(1)

a

is a single-layer feedforward neural network, which is parameterized by a weight vector

\vec{a} \in R^{2 F^{'}}

W \in R^{2 F^{'} \times F}

is a shared linear transformation matrix,

L e a k y R e L U

is a nonlinear function, and

N_{i}

is the neighbor node of node

i

. The equation finally uses a

s o f t m a x

function for normalization.

The calculation process of hidden layer features

{\vec{v}}_{i}^{'}

is shown in Figure 4(b). The calculation formula is:

{\vec{v}}_{i}^{'} = σ (\frac{1}{K} \sum_{k = 1}^{K} \sum_{j \in N_{i}} α_{i j}^{k} W^{k} {\vec{v}}_{j})

(2)

σ

is a nonlinear function. Since GAT uses a multi-head attention mechanism, K represents the number of attention heads.

3.1.2. LSTM

LSTM is a special type of recurrent neural network (RNN) architecture used for processing and predicting sequence data [31]. It uses gate mechanisms to better capture long-term dependencies in time series data, making it perform better in handling such data. An LSTM network consists of a series of LSTM cells, each of which contains an input gate, a forget gate, and an output gate. The input gate and forget gate are used to control the flow of information, with the input gate determining which information should be updated in the memory cell and the forget gate deciding which information should be discarded from the memory cell. The output gate determines how much influence the memory cell should have on the current output. The core cell unit of LSTM is shown in Figure 5.

3.2. Datasets

In order to describe the proposed method more clearly, it is necessary to first introduce the datasets used in this article. The datasets used in this article are NF-BoT-IoT-v2 and NF-ToN-IoT-v2[32], and their descriptions are shown in the following table:

Table 1. Basic information of the datasets.

Dataset	Release year	No. Classes	No. features	No. data	Benign ratio
NF-BoT-IoT-v2	2021	5	43	37,763,497	0.0 to 10.0
NF-ToN-IoT-v2	2021	10	43	16,940,496	3.6 to 6.4

These two datasets are improved based on the original datasets BoT-IoT [33] and ToN-IoT [34], where the network data is integrated in a stream fashion to make a complete network behavior.

The features in the aforementioned datasets are referred to as flow features, denoted as FI in this article. As the article requires the node status information, 14 additional features were added to these two datasets based on the original datasets, including:

This article refers to the features in Table 2 as node features, denoted as NI. With the above fields, the dataset is expanCded into two parts, with a total of 57 fields.

4. Method Description

4.1. Problem Definition

Suppose there are

n

nodes in a near-earth sensing system, defined as

N_{1}, N_{2}, \dots \dots, N_{n}

. The data flows sent by each node are defined in chronological order as

v_{i}^{k} (i [1, n], k [1, \infty))

, where

k

is the sequential number of the data flow.. The purpose of this article is to identify the attack type

r_{i}^{k}

of the data flow

v_{i}^{k}

sent by node

N_{i}

, as shown in Figure 6.

To solve the aforementioned issues, this article proposes the N-STGAT network intrusion detection system. As the goal of this article is to solve network intrusion detection problems in time series, a spatiotemporal graph neural network combining GAT and LSTM is used, which has good performance in identifying data flows with spatiotemporal sequences.

Figure 7 illustrates the system flow of N-STGAT. First, the extended dataset is preprocessed, and then

G

for training and

G_{t e s t}

for testing are generated based on the graph construction rules. Then, the

G

is fed into the training process of N-STGAT to obtain a trained model. Finally, the

G_{t e s t}

is input into the model for data flow classification.

4.2. Pre-processing and Graph Construction

To input the training data into GAT, the dataset needs to be transformed into a graph structure. Before constructing the graph, the dataset needs to be formatted to seamlessly apply to GAT. The flow-based dataset is collected through network monitoring tools, which capture and save packets that pass through nodes, switches, and routers. Then, the data packets are aggregated into flows using analysis tools. These datasets contain a wealth of important network information, including source and destination IP addresses, source and destination ports, byte counts, and other useful packet information. However, there are many data formats in the dataset that cannot be directly used for training, such as enumeration type, null value, and invalid value. Figure 8 shows the data processing process.

The dataset is grouped by the source IP address and each group is sorted by the time sequence of the data flows. In this way, the dataset is composed of thousands of chains of data flows. The data from time

t

(t + 7 t_{s} / 10)

is selected as the training dataset, and the data from

t_{t e s t}

(t_{t e s t} + 3 t_{s} / 10)

is selected as the testing dataset, as shown in Figure 6. Useless features such as IP addresses and ports are removed. There are also some invalid and empty values in the dataset, which are set to 0. Since the values of some fields in the dataset have a large range, quantification is needed. Finally, the remaining fields are normalized using L2 normalization.

Once the datasets for training and testing are selected, it is necessary to build graphs based on specific rules. Graph construction is done to find neighbors for nodes, so knowing the neighbor-finding process for one node is enough to construct the entire graph. The process of finding neighbors for nodes can be divided into two parts: finding neighborhood

B_{i}

for node

N_{i}

and finding neighborhood

N_{i}^{k}

for node

v_{i}^{k}

B_{i}

is obtained using the cosine similarity with the NI field of the dataset as shown in formula (3). The method is as follows: assuming the time when

v_{i}^{k}

exists is

t_{i}^{k}

, and the latest node before time

t_{i}^{k}

for any

N_{j}

v_{j}^{l t}

, calculate the cosine similarity

{\cos (θ)}_{i j}^{k l t}

between

v_{i}^{k}

and

v_{j}^{l t}

, if

{\cos (θ)}_{i j}^{k l t}

is greater than 0.7, then add

N_{j}

B_{i}

. Here,

x_{i}

represents the i-th feature value in

v_{i}^{k}

's NI field, and

y_{i}

represents the i-th feature value in

v_{j}^{l t}

's NI field.

\cos (θ) = \frac{\sum_{i = 1}^{n} (x_{i} y_{i})}{\sqrt{\sum_{i = 1}^{n} {(x_{i})}^{2}} \sqrt{\sum_{i = 1}^{n} {(y_{i})}^{2}}}

(3)

B_{i}

, all nodes

v_{j}^{l}

(including

v_{i}^{k}

) within (

t_{i}^{k} - Δ t)

t_{i}^{k}

are neighbors of

v_{i}^{k}

, and these nodes form the neighborhood

N_{i}^{k}

v_{i}^{k}

. Then, connect A and C to form an edge E, which is a directed edge from

v_{i}^{k}

v_{j}^{l}

. In this article,

Δ t

is set to 3s when constructing the graph using the method described above, resulting in the graph structure shown in formula (4).

V = v_{i}^{k} i [1, n], k [1, \infty) E = e_{i j}^{k l} l [t_{i}^{k} - Δ t, t_{i}^{k}], j N_{i}^{k} G \leftarrow V, E

(4)

4.3. N-STGAT Training

Because the dataset contains two labels, “Label”and “Attack”. The “Label”indicates whether the data flow is an attack or not, while the “Attack”label defines the type of the data flow. Therefore, the labels can be used for binary and multi-class classification training.

N-STGAT is described in Algorithm 1, and the pseudocode shows the process of integrating GAT and LSTM. In lines 3 to 6, GAT is used to calculate the hidden layer feature

{\vec{v}}_{i}^{k'}

. Then, in line 7,

{\vec{v}}_{i}^{k'}

is assigned to

x_{i}^{k}

to participate in the LSTM calculation process. Lines 8 to 13 represent the LSTM calculation process. Finally, in line 16, the output C of LSTM is fed into a fully connected layer to obtain the final recognition result

r_{i}^{k}

Algorithm 1: Pseudocode of the N-STGAT algorithm.
Input: $G (V, E$ ) node features ${\vec{v}}_{i}^{k}$ ; GAT weight matrices $W$ ; non-linearity $σ$ ; LSTM weight matrices $W_{f}, W_{l}, W_{C}, W_{o}, b_{f}, b_{l}, b_{C}, b_{o}$ ; LSTM `initialization` $C^{0}, h^{0}$
Output: node features $r_{i}^{k}$ ;
1	for $i$ =1 to $n$ do
2	for $k = 1$ to length( $N_{i}$ ) do
3	for $j = 1$ to length( $N_{i}^{k}$ ) do
4	$α_{i}^{k j} = \frac{{\cos (θ)}_{i}^{k j} (\exp (L e a k y R e L U ({\vec{a}}^{T} [W {\vec{v}}_{i}^{k} \| \| W {\vec{v}}^{j}])))}{\sum_{s \in N_{i}} {\cos (θ)}_{i}^{k s} (\exp (L e a k y R e L U ({\vec{a}}^{T} [W {\vec{v}}_{i}^{k} \| \| W {\vec{v}}^{s}])))}$
5	end
6	${\vec{v}}_{i}^{k'} = σ (\sum_{s \in N_{i}} α_{i}^{k s} W {\vec{v}}^{s})$
7	$x_{i}^{k} \leftarrow {\vec{v}}_{i}^{k'}$
8	$f_{i}^{k} = σ (W_{f} \cdot [h_{i}^{k - 1}, x_{i}^{k}] + b_{f})$
9	$l_{i}^{k} = σ (W_{l} \cdot [h_{i}^{k - 1}, x_{i}^{k}] + b_{l})$
10	${\tilde{C}}_{i}^{k} = t a n h (W_{C} \cdot [h_{i}^{k - 1}, x_{i}^{k}] + b_{C})$
11	$C_{i}^{k} = f_{i}^{k} * C_{i}^{k - 1} + l_{i}^{k} * {\tilde{C}}_{i}^{k}$
12	$o_{i}^{k} = σ (W_{o} \cdot [h_{i}^{k - 1}, x_{i}^{k}] + b_{o})$
13	$h_{i}^{k} = o_{i}^{k} * t a n h (C_{i}^{k})$
14	end
15	end
16	$r_{i}^{k} = F C (h_{i}^{k})$ //FC is fully connected layers

The algorithm uses only one layer of GAT model instead of multiple layers because multiple layers can make the features blur and not conducive to recognition. Specifically, we add the cosine similarity

{\cos (θ)}_{i}^{k j}

between two nodes as a constraint to the calculation of attention coefficients in the attention coefficient calculation process in the 4th line, so that the graph information constructed in 4.2 can be better expressed.

For each dataset mentioned in the experiments, we conducted experiments with the hyperparameters shown in Table 3 to obtain the best training model. As mentioned above, we used a single-layer GAT model instead of a multi-layer one. To better represent node features, we expanded the original 53 features to 256 hidden layer features. GAT supports multi-head attention, but in this case, we only used single-head attention since we introduced cosine similarity as a constraint for attention coefficient calculation. We used ReLU as the activation function and did not use dropout rate. Cross-Entropy function was used to calculate the training loss, and Adam was used for backpropagation optimization with a learning rate set to 0.002.

4.4. N-STGAT Detection

In the previous section, the N-STGAT model

M

was obtained through model training. In the detection task, the test graph

G_{t e s t}

is input into the model

M

. The 16th line of Algorithm 1 uses a fully connected layer to recognize the specific classification. Whether it is a binary or multi-classification problem, the softmax function is used to calculate the classification in the final layer of the fully connected layer to obtain the exact classification result.

5. Experimental Evaluation

5.1. Evaluation Metrics

To evaluate the performance of N-STGAT, experiments were conducted using the datasets mentioned in Section 3.2. Due to the large size of these datasets, 10% of the datasets were selected using the method described in Section 4.2 for experimentation, with 70% of the experimental dataset used for training and 30% used for testing. Since only a portion of the dataset was used for experimentation, experiments were conducted multiple times for each algorithm and each dataset

In the experiments, the proposed method was compared with graph neural network-based methods (GAT, E-ResGAT, Anomal-E) and traditional machine learning methods (SVM, Random Forest). Therefore, evaluation metrics as shown in Table 4 were used to assess the performance of all methods. TP, FP, FN, and TN represent True Positive, False Positive, True Negative, and True Negative in the Confusion Matrix, respectively. These evaluation metrics provide a comprehensive comparison of the superiority of the proposed method.

5.2. Result

In the experiments, the accuracy and loss changes during the training process were first recorded, and then the recognition results for both binary and multi-class classification were evaluated based on the performance metrics in Section 5.1.

5.2.1. Loss and Accuracy Comparison in Training

The training accuracy changes for multi-class classification on different datasets are shown in Figure 10. The training accuracy of dataset NF-BoT-IoT-v2 quickly reached 90% before 20 epochs and remained stable above 95% after 100 epochs. Its validation accuracy was lower than the training accuracy throughout the entire training process, but also stabilized above 92% in the end. The training accuracy of dataset NF-ToN-IoT-v2 was slightly lower than that of dataset NF-BoT-IoT-v2, with continuous growth before 600 epochs and stability above 93% after 700 epochs. Its validation accuracy was also lower than the training accuracy throughout the entire training process, but stabilized above 90% in the end.

The cross-entropy loss during the training process for multi-classification on different datasets is shown in Figure 11. For the dataset NF-BoT-IoT-v2, the training loss rapidly decreased to 0.6 before 100 epochs and stabilized around 0.4 after 200 epochs. The validation loss was higher than the training loss throughout the training process, but also stabilized at around 0.7 in the end. For the dataset NF-ToN-IoT-v2, the training loss was slightly worse than that of NF-BoT-IoT-v2, and it continued to decrease before 600 epochs and then stabilized at 0.4 after 700 epochs. The validation loss was higher than the training loss throughout the training process but also stabilized at 0.6 in the end.

5.2.2. Binary Classification Results

The results of the binary classification experiment using the “Label” label of the dataset is shown in Table 5. In the binary classification task, the proposed method in this article outperforms the other 5 methods in terms of recall and accuracy, indicating that the proposed method has a better recognition accuracy. Additionally, the proposed method also performs well in precision, with results only 0.4% lower than E-GraphSAG in the NF-ToN-IoT-v2 dataset but higher than the other 5 methods. In terms of F1-score, the proposed method outperforms the other 5 methods, indicating better stability of the proposed method.

5.2.3. Multiclass Classification Results

The results of the multi-class classification experiment using the “Attack” label of the dataset are shown in Table 6 and Table 7. It can be seen from the results that the proposed method has superiority in multi-class problems, with high recognition rates for each category of different datasets. On the NF-BoT-IoT-v2 dataset, the proposed method has a 3.53% - 20.49% improvement in Weighted Recall and 8.03% - 23.16% improvement in Weighted F1-Score compared to other methods. On the NF-ToN-IoT-v2 dataset, the proposed method has a 4.05% - 18.69% improvement in Weighted Recall and 7.17% - 17.78% improvement in Weighted F1-Score compared to other methods.

6. Conclusions

This article proposes a spatio-temporal graph attention network (N-STGAT) that considers node states and applies it to network intrusion detection in near-Earth remote sensing systems. A method is proposed for constructing graphs based on node state and the temporal characteristics of data flow, where data flow are viewed as nodes in the graph, and edges between nodes are constructed based on cosine similarity and time series features. A spatio-temporal graph neural network combining GAT and LSTM is applied to intrusion detection systems. Finally, experiments are conducted using the latest flow-based network intrusion detection dataset, and the proposed method is compared with existing methods to demonstrate its superiority and feasibility.

Author Contributions

Conceptualization, Yalu Wang and Jie Li; Data curation, Hang Zhao; Formal analysis, Zhijie Han; Investigation, Zhijie Han; Methodology, Yalu Wang and Wei Zhao; Project administration, Jie Li; Resources, Yalu Wang; Software, Zhijie Han; Supervision, Xin He; Validation, Yalu Wang, Jie Li and Wei Zhao; Writing – original draft, Yalu Wang; Writing – review & editing, Wei Zhao, Lei Wang and Xin He. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

This article references four datasets: NF-BoT-IoT-v2, NF-ToN-IoT-v2, BoT-IoT, and ToN-IoT. The datasets NF-BoT-IoT-v2 and NF-ToN-IoT-v2 can be found at the following link: https://staff.itee.uq.edu.au/marius/NIDS_datasets/. The link for the BoT-IoT dataset is: https://research.unsw.edu.au/projects/bot-iot-dataset. The link for the ToN_IoT dataset is: https://research.unsw.edu.au/projects/toniot-datasets.

Acknowledgments

This work was supported by the Key Science and Technology Project of Henan Province (201300210400).

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, D.; Zhang, J.; Du, B.; Xia, G.-S.; Tao, D. An Empirical Study of Remote Sensing Pretraining. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5608020. [Google Scholar] [CrossRef]
Goswami, A.; Sharma, D.; Mathuku, H.; Gangadharan, S.M.P.; Yadav, C.S.; Sahu, S.K.; Pradhan, M.K.; Singh, J.; Imran, H. Change Detection in Remote Sensing Image Data Comparing Algebraic and Machine Learning Methods. Electronics 2022, 11, 431. [Google Scholar] [CrossRef]
Sun X, Zhang Y, Shi K; et al. Monitoring water quality using proximal remote sensing technology[J]. Sci. Total Environ. 2022, 803, 149805. [Google Scholar] [CrossRef]
Chen, J.; Chen, S.; Fu, R.; et al. Remote sensing big data for water environment monitoring: Current status, challenges, and future prospects[J]. Earth's Future 2022, 10, e2021EF002289. [Google Scholar] [CrossRef]
Li, J.; Hong, D.; Gao, L.; et al. Deep learning in multimodal remote sensing data fusion: A comprehensive review[J]. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102926. [Google Scholar] [CrossRef]
Lo, W.W.; Layeghy, S.; Sarhan, M.; Gallagher, M.; Portmann, M. E-GraphSAGE: A Graph Neural Network based Intrusion Detection System for IoT, NOMS 2022-2022 IEEE/IFIP. Netw. Oper. Manag.Symp.Bp. Hung. 2022, 1–9. [Google Scholar] [CrossRef]
He, H.; Sun, X.; He, H.; Zhao, G.; He, L.; Ren, J. A Novel Multimodal-Sequential Approach Based on Multi-View Features for Network Intrusion Detection. IEEE Access 2019, 7, 183207–183221. [Google Scholar] [CrossRef]
Lawal, M.A.; Shaikh, R.A.; Hassan, S.R. An Anomaly Mitigation Framework for IoT Using Fog Computing. Electronics 2020, 9, 1565. [Google Scholar] [CrossRef]
Sarhan, M.; Layeghy, S.; Moustafa, N. ; Portmann, MNetFlow Datasets for Machine Learning-Based Network Intrusion Detection Systems. In: Deze, Z., Huang, H., Hou, R., Rho, S., Chilamkurti, N. (eds) Big Data Technologies and Applications. BDTA WiCON, 2021. [Google Scholar] [CrossRef]
Kumar, P.; Gupta, G.P.; Tripathi, R. An ensemble learning and fog-cloud architecture-driven cyber-attack detection framework for IoMT networks[J]. Comput. Commun. 2021, 166, 110–124. [Google Scholar] [CrossRef]
Churcher, A.; Ullah, R.; Ahmad, J.; ur Rehman, S.; Masood, F.; Gogate, M.; Alqahtani, F.; Nour, B.; Buchanan, W.J. An Experimental Analysis of Attack Classification Using Machine Learning in IoT Networks. Sensors 2021, 21, 446. [Google Scholar] [CrossRef]
Cheng, Q.; Wu, C.; Zhou, S. Discovering Attack Scenarios via Intrusion Alert Correlation Using Graph Convolutional Networks. IEEE Commun. Lett. 2021, 25, 1564–1567. [Google Scholar] [CrossRef]
Caville, E.; Lo, W.W.; Layeghy, S.; et al. Anomal-E: A self-supervised network intrusion detection system based on graph neural networks[J]. Knowl. -Based Syst. 2022, 258, 110030. [Google Scholar] [CrossRef]
Huang, Y.; Bi, H.; Li, Z.; Mao, T.; Wang, Z. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 6272–6281. https://openaccess.thecvf.com/content_ICCV_2019/html/Huang_STGAT_Modeling_Spatial-Temporal_Interactions_for_Human_Trajectory_Prediction_ICCV_2019_paper.html.
Casas, P.; Mazel, J.; Owezarski, P. Unsupervised network intrusion detection systems: Detecting the unknown without knowledge[J]. Comput. Commun. 2012, 35, 772–783. [Google Scholar] [CrossRef]
Lawal, M.A.; Shaikh, R.A.; Hassan, S.R. An Anomaly Mitigation Framework for IoT Using Fog Computing. Electronics 2020, 9, 1565. [Google Scholar] [CrossRef]
Vormayr, G.; Zseby, T.; Fabini, J. Botnet Communication Patterns. IEEE Commun. Surv. Tutor. 2017, 19, 2768–2796. [Google Scholar] [CrossRef]
Monowar, H.; Bhuyan, D.K.; Bhattacharyya, J.K. Kalita, Surveying Port Scans and Their Detection Methodologies. Comput. J. 2011, 54, 1565–1581. [Google Scholar] [CrossRef]
Kambourakis, G.; Moschos, T.; Geneiatakis, D.; Gritzalis, S. Detecting DNS Amplification Attacks. In: Lopez, J., Hämmerli, B.M. (eds) Critical Information Infrastructures Security. CRITIS, 5141. [Google Scholar] [CrossRef]
Leichtnam, L.; Totel, E.; Prigent, N.; Mé, L. Sec2graph: Network Attack Detection Based on Novelty Detection on Graph Structured Data. In: Maurice, C., Bilge, L., Stringhini, G., Neves, N. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA, 1222. [Google Scholar] [CrossRef]
Hao, J.; Liu, J.; Pereira, E.; et al. Uncertainty-guided graph attention network for parapneumonic effusion diagnosis[J]. Med. Image Anal. 2022, 75, 102217. [Google Scholar] [CrossRef]
Jiang, W. Graph-based deep learning for communication networks: A survey[J]. Comput. Commun. 2022, 185, 40–54. [Google Scholar] [CrossRef]
Jiang, W.; Luo, J. Graph neural network for traffic forecasting: A survey[J]. Expert Syst. Appl. 2022, 117921. [Google Scholar] [CrossRef]
He, X.; Deng, K.; Wang, X.; Li, Y.; Zhang, Y.; Wang, M. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '20). Association for Computing Machinery, New York, NY, USA; 2020; pp. 639–648. [Google Scholar] [CrossRef]
Sun, P.; Guo, Z.; Wang, J.; Li, J.; Lan, J.; Hu, Y. Deepweave: Accelerating job completion time with deep reinforcementlearning-based coflow scheduling. Int. Jt. Conf. Artif. Intell. 2021; 3314–3320. [Google Scholar]
Xu, K.; Hu, W.; Leskovec, J.; et al. How powerful are graph neural networks?[J]. arXiv 2018, arXiv:1810.00826. [Google Scholar] [CrossRef]
Cai, H.; Zheng, V.W.; Chang, K.C.-C. A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications. IEEE Trans. Knowl. Data Eng. 2018, 30, 1616–1637. [Google Scholar] [CrossRef]
Veličković, P.; Cucurull, G.; Casanova, A.; et al. Graph attention networks[J]. arXiv 2017, arXiv:1710.10903. [Google Scholar] [CrossRef]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks[J]. arXiv arXiv:1609.02907, 2016. [CrossRef]
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs[J]. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Smagulova, K.; James, A.P. A survey on LSTM memristive neural network architectures and applications. Eur. Phys. J. Spec. Top. 2019, 228, 2313–2324. [Google Scholar] [CrossRef]
Sarhan, M.; Layeghy, S.; Portmann, M. Towards a Standard Feature Set for Network Intrusion Detection System Datasets. Mobile Netw Appl 2022, 27, 357–370. [Google Scholar] [CrossRef]
Koroniotis, N.; Moustafa, N.; Sitnikova, E.; et al. Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-iot dataset[J]. Future Gener. Comput. Syst. 2019, 100, 779–796. [Google Scholar] [CrossRef]
Moustafa, N. A new distributed architecture for evaluating AI-based security systems at the edge: Network TON_IoT datasets[J]. Sustain. Cities Soc. 2021, 72, 102994. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of near-ground remote sensing system based on the Internet of Things.

Figure 2. Example of near-ground remote sensing technology in agriculture.

Figure 3. Characteristics of IoT structure.

Figure 4. (a) The calculation process of attention coefficient; (b) The calculation process of hidden layer features.

Figure 5. The core cell unit of LSTM.

Figure 6. Problem Definition Structure.

Figure 7. The workflow of the proposed N-STGAT intrusion detection system. First, the dataset is preprocessed, and a graph is constructed for training and testing based on time-axis relationships and similarity rules (left). N-STGAT is used to train the model on the training graph, and the trained model is output (middle). Finally, the generated testing graph is input into the trained model for intrusion detection classification (right).

Figure 8. Data preprocessing process.

Figure 9. Random selection of datasets.

Figure 10. The accuracy changes on different datasets. (a) Dataset NF-BoT-IoT-v2. (b) Dataset NF-ToN-IoT-v2.

Figure 11. The loss changes on different datasets. (a) Dataset NF-BoT-IoT-v2. (b) Dataset NF-ToN-IoT-v2.

Table 2. Details of the newly added features.

Feature	Description
TIMESTAMP	The timestamp when the data flow is sent.
PROCESS_LOAD	1-minute load average
PROCESS_ID	Idle CPU percentage.
PROCESS_HI	Hard interrupt CPU percentage
PROCESS_US	User-space CPU percentage,
PROCESS_SY	Kernel-space CPU percentage
MEMORY_USED	Memory used ratio
MEMORY_BUFFER	Memory cache ratio
MEMORY_NETWORK	Memory used by network module ratio
NET_PAKAGES	Number of packets sent
NET_ BANDWIDTH_OUT	Network egress bandwidth
NET_TCP_CONNECTIONS	Number of TCP connections
DISK_READ	Disk read speed
DISK_WRITE	Disk write speed

Table 3. Hyperparameter values used in N-STGAT.

Hyperparameter	Values
No. Layers	1
No. Hidden	256
No. K	1
Learning Rate	${2 e}^{- 3}$
Activation Func.	ReLU
Loss Func.	Cross-Entropy
Optimiser	Adam

Table 4. Performance comparison metrics for experiments.

Metric	Definition
Recall	$\frac{T P}{T P + F N}$
Precision	$\frac{T P}{T P + F P}$
F1-Score	$2 \times \frac{R e c a l l \times P r e c i s i o n}{R e c a l l + P r e c i s i o n}$
Accuracy	$\frac{T P + T N}{T P + F P + F N + T N}$

Table 5. Results of each method on binary classification.

DataSet	Algorithm	Recall	Precision	F1-Score	Accuracy
NF-BoT-IoT-v2	SVM	0.8485	0.9367	0.8904	0.8299
	Random Forest	0.8212	0.9151	0.8656	0.7923
	GAT	0.9013	0.9633	0.9313	0.8917
	E-GraphSAG	0.9615	0.9825	0.9719	0.9547
	Anomal-E	0.9412	0.9859	0.963	0.9412
	N-STGAT	0.9812	0.9927	0.9869	0.9788
NF-ToN-IoT-v2	SVM	0.7689	0.8991	0.8289	0.7415
	Random Forest	0.7368	0.9307	0.8224	0.7409
	GAT	0.8746	0.9724	0.9209	0.8776
	E-GraphSAG	0.9578	0.9867	0.972	0.9551
	Anomal-E	0.9461	0.9846	0.965	0.9441
	N-STGAT	0.9755	0.9827	0.9791	0.9661

Table 6. Performance results for each algorithm over multiple classifications.

DataSet	Algorithm	Weighted Recall	Weighted F1-Score
NF-BoT-IoT-v2	SVM	0.7101	0.6948
	Random Forest	0.7719	0.7492
	GAT	0.7227	0.7021
	E-GraphSAG	0.8797	0.8461
	Anomal-E	0.865	0.8016
	N-STGAT	0.915	0.9264
NF-ToN-IoT-v2	SVM	0.7195	0.7514
	Random Forest	0.7786	0.7364
	GAT	0.7699	0.8163
	E-GraphSAG	0.8533	0.8204
	Anomal-E	0.8659	0.8425
	N-STGAT	0.9064	0.9142

Table 7. Recall results per class on the tow datasets.

Dataset	Algorithm	Per class Recall
NF-BoT-IoT-v2		Benign	RN	DDos	Dos	Theft
	SVM	0.7154	0.6148	0.8412	0.8649	0.7225
	Random Forest	0.8205	0.8415	0.7451	0.5748	0.8148
	GAT	0.8952	0.6715	0.8216	0.7469	0.7149
	E-GraphSAG	0.8756	0.8912	0.82465	0.9051	0.8694
	Anomal-E	0.9049	0.8648	0.7903	0.9417	0.8795
	N-STGAT	0.9506	0.9241	0.8786	0.9207	0.9513
NF-ToN-IoT-v2		Benign	RN	DDos	Dos	Backdoor	Injection	MITM	Password	Scanning	XSS
	SVM	0.7147	0.5792	0.8106	0.7129	0.6129	0.6792	0.8059	0.7138	0.846	0.7816
	Random Forest	0.8703	0.7126	0.6109	0.5498	0.7159	0.8619	0.7469	0.8759	0.7482	0.6874
	GAT	0.8761	0.7454	0.8418	0.7923	0.8219	0.7619	0.8242	0.6958	0.8716	0.9015
	E-GraphSAG	0.9418	0.8819	0.9112	0.8109	0.8846	0.7805	0.8759	0.8904	0.8513	0.8927
	Anomal-E	0.8042	0.9229	0.8496	0.8217	0.9036	0.9158	0.8176	0.9013	0.7219	0.9014
	N-STGAT	0.9712	0.9013	0.9013	0.8735	0.9213	0.9016	0.9254	0.9186	0.8619	0.9208

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

MDPI Initiatives

Important Links

Choose an area of interest and we will send you notifications of new preprints at your preferred frequency.

Disclaimer