2.1. Study Area
Nokoué lake, (
Figure 1), is located in the southeast of Benin, between 6°25'N and 2°36'E, covering an area that varies between 150 km
2 and 170 km
2 respectively during the low water period and high-water period, respectively [
26,
27,
28,
29]. It stretches for approximately 20 km from east to west along the coast and 11 km from south to north, as confirmed by multiple authors such as[
27,
28]. The average and maximum depths of the lake are approximately 1.3m and 2.9m, respectively. Towards the Cotonou channel, the Nokoué lake deepens, and the average and maximum depths reach around 3m and 8m, respectively. Two rivers flow into Nokoué lake: on its northern bank is the Sô-Ava river, which drains a watershed area of approximately 10,000 km
2, and the Ouémé river, the largest river in Benin, which drains a watershed area of approximately 50,000 square kilometer [
26,
28]. The Djonou river, with a smaller extent and flow, also contributes to the freshwater input in the southwestern part of Nokoué lake. In the southern part, Nokoué lake is connected to the Atlantic Ocean through the Cotonou channel, which is 280 meters wide and approximately 4 kilometers long [
26,
29]. Through this canal, constructed in 1885, exchanges of freshwater and saltwater occur in accordance with the tides and hydrological regime [
1]. The canal of Tochè, approximately 4 km long, connects the Porto Novo lagoon, with an area of approximately 35 km
2, to Lake Nokoué on the eastern side, with little effect on the dynamics of Lake Nokoué. At a seasonal scale, the hydrological regime of Lake Nokoué is determined by the West African summer monsoon, resulting in two rainy seasons and two dry seasons. [
1]. These seasons are linked to the north-south movement of the intertropical convergence zone and the associated belt of intense tropical rainfall. The main rainy season extends from April-May to the end of July when the intertropical convergence zone moves northward from its southern position near the equator. The second rainy season, shorter and less intense, occurs from late September to November when the intertropical convergence zone migrates southward from its northernmost position. However, this dual rainy season, along with local precipitation, has only a weak influence on the water level of Nokoué lake. Nokoué lake is more influenced by the hydrology of the central part of Benin, where the main basin of the Ouémé river is located [
29]. This region is characterized by a single rainy season with peak precipitation occurring between July and October [
29,
30]. The period from September to October is when the maximum flow enters Nokoué lake.
2.4. Structure of the Long Short Term Memory Model
The Long Short-Term Memory (LSTM) cell is an enhancement of the optimization proposed in [
32]. Considered as a black box, the LSTM cell, widely used in time series forecasting, is an impressive architecture of an artificial recurrent neural network (RNN) capable of memorizing the temporal order of data. Furthermore, the LSTM overcomes the issues of gradient instability and insufficient memory capacity through the state of its cells and gates [
33]. The two main problems of the RNN architecture are gradient instability and its inability to retain information from long sequences of temporal data. In contrast, the LSTM cell, as a deep learning predictive model, receives the latent states from the previous step and has a self-evaluation mechanism that offers better performance in time series forecasting. The internal structure of the LSTM consists of three main gates that control the flow of information: (i) forgetting unwanted information in the current cell state through the forget gate (f
t), (ii) adding additional data to the current cell state through the input gate, also known as the temporal attention module (I
t), and (iii) producing an output from the current cell state through the output gate (O
t). These gates serve specific operations on the cell states. The state of the LSTM network is divided into two states: h
t and c
t. The hidden state h
t of the LSTM network is considered as short-term memory, while the cell state c
t is considered as long-term memory of the network. The operations performed within the LSTM cells help the model retain information from sequential data. The LSTM network uses cells as memory units for the model. The gates, as shown in
Figure 2 and illustrated in Equations 1, 4, 5, and 6, determine the data to be carried.
- a)
The forget gate
The forget gate, f
t, determines the amount of information from the previous timestamp that should be transmitted and is the essence of the LSTM architecture. It determines the amount of memory preserved from the previous memory state, C
t-1. In Equation 2, the previous hidden state, h
t-1, and the current input information, x
t, are passed through the sigmoid activation function. Information associated with 0 is forgotten, while information associated with one 1 continues to be carried through the cell state.
Where:
σ is the sigmoid activation function ;
wfh and wfx represent the weight matrices of the forget gate, ft, for their connections to the previous hidden state ht–1 and to the input vector xt;
bf the bias term for the forget gate.
The flow of the long-term state, c
t-1, through the network is from left to right. It first passes through a forget gate, which discards certain information, and then adds new information through an addition operation (the added information is selected by an input gate), and the resulting c
t is sent directly without any further transformation. Therefore, at each time step, information is removed and new information is added. The update calculation of c
t is illustrated in the following
Figure 2 and Equations 3 et 4:
Where :
wgh and wgx represent the weight matrices of the main layer gt for their connections to the previous short-term state ht–1 and to the input vector xt;
bg is the bias term for the main layer.
The input gate, It, in Equation 5 determines which parts of the main layer, g
t, to update in the long-term state. The previous and current information is updated based on the result of the sigmoid operation (σ). Information associated with 0 is considered trivial, while information associated with 1 is deemed essential. Additionally, the hyperbolic tangent activation function (tanh), which compresses data between -1 and 1, is used to regulate the network. The outputs of the sigmoid and tanh functions are then multiplied to select the information that will be updated.
Where :
wih and wix represent the weight matrices of the input gate for their connections to the short-term state ht-1 and to the input vector xt ;
bi the bias term of the input gate.
In Equations 6 and 7, the output gate, o
t, is also used for inference and determines which parts of the long-term state should be read and output at this time step, both in h
t and in y
t. Additionally, after the addition operation, the long-term state is copied and passed through the hyperbolic tangent function (tanh), with the result being filtered by the output gate. The result is the short-term state, ht, which is equal to the output y
t of the LSTM cell.
Where:
woh and wox represent the weight matrices of the output gate for their connections to the short-term state ht-1 and to the input vector xt;
bo is the bias term of the output gate;
ht is the output result of the masked layer at time step t.
2.5. Long Short-Term Memory Model Configuration
In this work, optimizing the performance of the LSTM model involves selecting the input variables, determining the appropriate network architecture, optimizing network learning, and using a reliable validation methodology. The LSTM network, as explained earlier, consists of an input layer, a single hidden layer, and an output layer with sigmoid activation functions for the artificial neurons and hyperbolic tangent for the hidden states. The optimal initialization of the model's learning algorithm parameters, such as the number of hidden neurons, the optimization function, the number of iterations, and the batch size, for performance estimation is performed using the random search cross-validation method (randomSearchCV), during which we test and evaluate different combinations of inputs (rainfall, discharge, water level). The preprocessed database is divided into two parts:
– the part intended for training to recognize the system's dynamics, which is the most important part (80%);
– the testing part (20%) which prevents overfitting by checking and testing the loss function evolution during training and validation. After the training is stopped and the weights of the interconnections of the most performing model are saved. The validation dataset allows for confirmation of the LSTM model's performance.