1. Introduction
The ocean is responsible for regulating the Earth’s climate and provides humans with valuable resources, like energy and food [
1]. Therefore, the sustainable usage of marine resources is an emerging concern. Marine environmental pollution related with human activities is a historically identified problem, but it only received the necessary attention during the recent years, when the anthropogenic pressure on the aquatic ecosystems and organisms reached a dangerous ecological threshold [
2]. This intense anthropogenic pressure on the coastal environment is the result of the doubling of the human population and rapid industrial development [
1]. Some of these anthropogenic activities impacting the coastal zones are related with inputs of excessive nutrients [
3], heavy metals, and other pollutants originating from the land, like microplastics [
4]. It is estimated that globally about 80% of marine pollution is land-derived [
5]. The environmental degradation of the coastal water results in harmful effects for the marine organisms and negatively impacts human wellbeing.
Eutrophication is considered as a key local stressor for coastal marine ecosystems. According to a study by Smith [
6], which examined 92 coastal ecosystems, the coastal Chlorophyll-a (Chl-a) production was found to be related with two nutrients, Nitrogen (N) and Phosphorus (P). Furthermore, climate change and anthropogenic eutrophication have resulted in large variations in microalgae assemblage composition globally, like increase of Harmful Algal Blooms (HABs) or biomass increase [
7]. The main impacts of these changes in algal composition include hypoxia/anoxia [
8] with catastrophic side effects on aquatic organisms (e.g., declining fisheries stock). Additionally, eutrophication may trigger harmful bacterial production, which negatively affects the corals and other marine organisms [
9]. Another side effect of eutrophication is related to nuisance blooms, which are having negative economic and societal impacts because of water aesthetic degradation, like water discoloration or foam [
10].
The eutrophication of the coastal waters is addressed by several EU Directives including the Water Framework Directive (WFD) 2000/60/EC, the Marine Strategy Framework Directive (MSFD) 2008/56/EC and the Nitrates Directive 91/676/EEC; as well as Regional Sea Conventions, such as the Barcelona Convention for the protection of the Mediterranean Sea. The assessment of surface water bodies and the examination of their physicochemical status for the identification of anthropogenic pressure and possible changes, is a crucial issue for the associated environmental authorities [
11]. Traditional methodologies include the analysis of data by using statistical methods, such as cluster analysis and ordination. Modelling studies have demonstrated that the application of suitable models, like the Artificial Neural Networks (ANNs), enables to examine the association/impact of several environmental parameters on water quality problems, like eutrophication [
8].
Most of the water quality studies, that are using ANNs as modelling tools, are dealing with limnological or riverine applications, while ANN applications examining coastal systems are considerably fewer [
12]. Specifically, for eutrophication related problems and their catastrophic effects, the use of ANN’s predictions can prevent or minimize the effects of any possible HABs [
13]. The ANN’s well known abilities to model complex and non-linear relationships make them ideal for eutrophication modelling. As stated by Yussef et al. [
14], in contrast to some other modelling techniques (e.g., statistical methods), ANNs are not affected by non-linearities or the complex interdependencies of interlayer connections.
Besides the predicting ability of ANNs to simulate algal productivity with a good accuracy, ANNs are good at examining the effect of the related water quality parameters based on their ability to associate them with the function of the algal biomass [
15]. A category of ANNs broadly used in the eco-hydrological field are the unsupervised ANNs, known as the Kohonen Self-Organizing Maps (SOMs) [
16]. These SOMs are mainly used for clustering [
17] and for exploratory data analysis (data mining) of the investigated environmental data set [
18]. As stated by Park et al. [
19], the method of multivariate analysis is mainly applied for ecological patterning; however, ANNs are more suitable for this task because of the nonlinear and complex possible interactions between the various parameters in the modeled data set (which many times consists of many different species and sampling areas).
An example of a SOM model applied in eco-hydrological modelling is found in the study of Lu and Lo [
20], where a trophic state classifier was constructed based on a SOM model, aiming to diagnose the water quality of the Fei-Tsui Reservoir (Taiwan) during the monitoring period, between 1987-1995, and compared the simulated SOM results with those of the Carlson Index. In the study of Li et al. [
21], the SOM model is applied to evaluate the spatiotemporal variations of groundwater quality data in Northeast Beijing, where based on SOM’s clustering, different pollution sources (like industrial and agricultural activities, domestic-sewage-discharge sources) were identified for the related sampling sites.
Another category of ANNs are the multilayer feed-forward neural networks, which are supervised learning-based ANNs. This type of ANNs is capable of predicting the Chl-a levels based on several water quality parameters associated to algal production [
22]. These environmental parameters, which are used as the ANN’s inputs, may differ among modelling studies of coastal eutrophication. For example, in their study Salami et al. [
23] created a feed-forward back propagation ANN for predicting coastal Chl-a values near Grant Line Canal, California, USA, based on the Electric Conductivity (EC), water temperature (WT) and pH parameters. Even though only three monitoring parameters were used as the model’s inputs, the created ANN managed to predict the Chl-a levels with a satisfactory accuracy rate (75.9%). While, in another study by Melesse et al. [
24], the coastal Chl-a levels at Florida Bay were modelled with the use of a back propagation ANN. Specifically, the authors examined various combinations of seven candidate input parameters (total phosphate, nitrite, ammonium, turbidity, WT, DO and antecedent Chl-a), and it was concluded that the ANN performed better when using all the above input parameters.
Data-driven models based on ANNs algorithms can be used to support the development of eutrophication control management tools, since ANNs are able to reveal the underlying mechanisms associated with algal productivity and the related environmental parameters [
25]. Additionally, as stated by Georgescu et al. [
26], the application of AI methods for water quality modelling saves time and resources in lab analysis, while the generated statistical data are important for the relevant authorities/managers. The above practical reasons and the fact that no other similar modelling study based on SOM models currently exists for the Cyprus coastal waters, motivated the current modelling study. The proposed SOM model enables us to comprehend to a greater extent possible hidden mechanisms and interactions between the Chl-a parameter and the rest of the eutrophication-related parameters. In the proposed modelling study, we are focusing on the role/interactions of water quality parameters associated to eutrophication and the impact of anthropogenic activity for several coastal stations near the Republic of Cyprus. The land use of the different regions near the sea catchment area is reflected based on the nutrients’ concentration in the nearby coastal stations, while it is well documented that excessive amounts of nutrients in the surface water may lead to eutrophication [
27]. In our case, it was found that the water quality status of Cyprus is good and practically not impacted by anthropogenic activities. Nevertheless, the created data-driven models can act as advisory/management tools for assessing the expected pressure from planned anthropogenic activities or even environmental changes, like global warming.
4. Discussion
Eutrophication is an environmental issue closely related to anthropogenic activities. A vast number of monitoring studies are pointing out the negative impact of these anthropogenic activities, which are responsible for nitrogen and phosphorus release into the water environment [
3]. For example, in a water quality study of Papastergiadou et al. [
50], long term hydrological data and a GIS system were used for extracting land cover/use changes, while the authors concluded that anthropogenic activities are seriously affecting water quality and are promoting eutrophication. Therefore, understanding eutrophication related water quality parameters interactions and how each of these environmental parameters affects algal production is the keystone to developing sustainable management practices and restoration measures in eutrophication-affected areas [
51]. As stated by Peppa et al. [
52], many environmental studies are dealing with the prediction and analysis of eutrophication phenomena and the related parameters interactions in order to identify the possible causes and to provide possible solutions for the problem.
The maintenance/achievement of good water quality status is a goal for all the European Union member countries, including the Republic of Cyprus. For that reason, as indicated before, several Directives must be implemented, like the Water Framework Directive (WFD), the Nitrates Directive and the Marine Strategy Framework Directive (MSFD). In this modelling study, data-driven modelling techniques are applied aiming to model the coastal water quality in several areas of Cyprus. Based on the modelling outputs, the Chl-a levels can be predicted, but also the eutrophication-related water parameters and their contribution to Chl-a production can be evaluated. Specifically, two different types of ANNs were utilized for the needs of this modelling study. Firstly, an unsupervised type of ANN was created, specifically the SOM model. Secondly, another type of ANN, the feed-forward ANN, which is a supervised type was also developed. By combining the output information provided by these two types of ANNs, an in-depth investigation of the eutrophication phenomenon was enabled. In their study, Youssef et al. [
14] state that ANNs have better performance in comparison to other machine learning and statistical methods, however, their black box nature makes ANNs’ outcomes difficult to interpret and explain in practice. In our case, the parallel utilization of the SOM’s results and the feed-forward ANN’s sensitivity analysis outcomes, enabled us to unravel hidden complex mechanisms between the Chl-a parameter and the rest of the water quality parameters. As stated by Chon [
53], the integration of the SOM and MLP models promotes the advanced information extraction from water quality data sets.
According to Kalteh et al. [
44], the SOM can be characterized as a modelling technique suitable to investigate many types of aquatic systems and water resources processes. Also, the previous authors state that the SOM has the ability to group data into homogeneous areas, which is useful when needed to transfer information from gauged to ungauged sites (like geographically remote areas). Another useful property of the SOM comes from its clustering capabilities and the heat maps associated with the CPs, which allow visual qualification of relationships between input parameters properties [
54]. The utilization of SOM is very beneficial when the correlation between the input parameters is non-linear and/or when dealing with noisy data; under those conditions the CPs can reveal relationships between the data that wouldn’t be otherwise detected [
55]. In their study, Astel et al. [
56] are emphasizing the SOM’s classification and visualization ability for large water quality data sets, while the authors are also mentioning the SOM’s ability for simultaneous observation of the water quality parameters and their spatial and temporal changes based on the CPs visualization. Meanwhile, Varbiro et al. [
57] argue the SOM’s superiority against traditional multivariate statistical methods (like cluster analysis and ordination) because of the SOM’s ability to simplify data’s complex statistical relationships between the variables into simple geometric relationships represented into a 2-dimensonial space.
Regarding the second ANN implemented in this modelling study, the feed-forward ANN was chosen, which is a supervised type of ANN. The feed-forward ANNs are able to model non-linear complex environmental systems [
58]. Additionally, as stated by Bushra et al. [
59], the backpropagation ANNs have the merit of being simple to adapt and no tuning or learning is required for their parameter and function features. Furthermore, as it is stated by Brown et al. [
60], ANN models are giving more reliable outputs in comparison to other machine learning methods (e.g., decision trees or linear regression) when the data measurements number is relatively small, like in our case. Generally, feed-forward ANNs are considered reliable predictors of the Chl-a parameter and are widely used for Chl-a levels prediction [
8].
As it was mentioned above, the created feed-forward ANN model managed to model the Chl-a levels with high accuracy, while the error between the real and the predicted data is very small, which is easily observed from the graphical illustrations. For the relatively low-medium values of the Chl-a parameter, the ANN produced almost identical outputs between the real and the simulated data. For the elevated Chl-a values, the ANN’s error tends to increase, however, the calculated ANN’s values are still near the measured ones, suggesting the ANN’s good generalization ability. Despite these small errors, the ANN managed to correctly categorize the trophic status for all data samples.
The perturb sensitivity analysis algorithm was applied and each parameter was fluctuated by +10% respectively. Based on the sensitivity analysis results, the basic trends between each input parameter and the Chl-a parameter were observed. When the parameters were increased/fluctuated by +10%, it was concluded that the salinity parameter was the most influential, since the Chl-a levels experienced the biggest modification.
The WT and DO parameters were also found to be significantly influential concerning Chl-a production. For the WT parameter, it was calculated that the WT and the Chl-a are negatively associated. This finding agrees with the fact that the coastal Chl-a levels near Cyprus reach their maximum values during the winter to early spring months, where cooler temperatures prevail, following the winter mixing and increase of phytoplankton production [
61]. This is also recorded by Fyttis et al. [
62] during a monitoring study of 12 consecutive months (January—December 2016), where the maximum coastal Chl-a levels of Cyprus were recorded during the winter. Regarding the salinity parameter, the Chl-a levels are significantly decreased when the salinity is decreased and vice versa. The upwelling phenomenon is again suggested to be related with this, since during the upwelling phenomenon nutrient rich water is emerging to the surface [
63].
Regarding the strong negative relationship between the DO and the Chl-a parameters, the upwelling might also explain this. In a study of Georgiou et al. [
64] in the Amvrakikos Gulf (Greece), low oxygen levels are reported during winter months. The above authors are attributing the anoxia to the strong winds and the resulting upwelling phenomenon. Therefore, the wintertime upwelling (and wind speed) is a factor that should be considered for future water quality modelling studies in Cyprus. As mentioned by Suursaar [
65], the wintertime upwelling is a phenomenon, which has been ignored and not given the necessary attention, in contrast to the summer upwelling.
The rest parameters seem to be less contributing to the algal production. The feed-forward ANN captured the relationship between the phosphorus and the Chl-a parameters, where the increased values of phosphorus are positively related with increased algal production and vice versa. As stated by Ren et al. [
66], the high levels dissolved inorganic phosphorus, mainly in the form of phosphate in the water column, could enhance the algal production. Regarding the DIN species, a less important relationship with the Chl-a parameters is found, which is having similar behaviour with the phosphorus parameter. A major source of DIN into coastal waters is associated with atmospheric deposition. Two main sources of DIN are related with anthropogenic activities, specifically riverine inputs and atmospheric deposition. In the study of Paerl et al. [
67] contacted along the U.S coast and the eastern Gulf of Mexico, it was estimated that the nitrogen atmospheric deposition was responsible for a range of values between 10% and 40% of the new nitrogen loadings. While, according to Droge and Kroeze [
68], riverine inputs are considered the main source of nitrogen for coastal waters and as estimated by the authors based on modelling studies, the DIN export will keep increasing in comparison to the pre-industrial era.
The development of data-driven models is a precious scientific tool for coastal water quality modelling. In our case, the integration of a supervised and an unsupervised ANN was proven to be a successful combination, not only for predicting the Chl-a levels, but also for examining the interactions of the eutrophication related parameters. The sensitivity analysis results provided the tendency regarding the parameters fluctuations (increased/decreased) and the analogous negative/positive impact on the algal production mechanism. At the same time, the SOM model enabled an in-depth examination of the water quality parameters dataset. Specifically, in the SOM case, the resulted clustering of the data revealed biological mechanisms regarding algal production between the groups, which are not apparent if the data set is examined as a whole. The SOM’s results revealed hidden relationships between the water quality parameters, which couldn’t be easily identified or understood based on other modelling procedures. The visualization ability and the grouping of the SOM enabled us to make associations for specific value ranges for the parameters. As highlighted by Duarte et al. [
69], complex patterns and interactions between the input parameters can be interpreted and understood based on the CPs visualization.
Regarding the nutrients based on the SOM’s results, the Chl-a parameter and the NH
4+, NO
2- and PO
43- parameters are having similar box plots and CPs, suggesting a strong relationship between the Chl-a parameter and the impact of the NH
4+, NO
2- and PO
43-. While regarding the NO
3- parameter, its moderate concentrations, based on the SOM clustering, are associated with the highest Chl-a values. The SOM’s clustering of the data set (see
Figure 4) verified the good water quality status of Cypriot coastal water, since only 1.4% of the total samples were characterized as problematic by the SOM results. In their study, Varbiro et al. [
70] are applying the SOM to evaluate the Danube’s tributaries based on diatom association, where the authors concluded that the upper stretch (German-Austrian region) is having better water quality than the lower stretch (Slovakian-Hungarian region). This SOM’s visualization ability, which enables clustering the data samples and at the same time comparing the parameters’ concentration levels for each cluster based on the analogous CP region, enables the extraction of conclusions about the different data sampling stations and their association with different water quality status. In our case, this finding can provide important information to the local authorities for the eutrophication, since it is indicated that not all the nutrients must have the same treatment regarding eutrophication control, as analysed above based on the box plots results (
Figure 5).
Despite the limiting factor of the relatively small data set used in this modelling study, the created ANNs not only managed to perform well, but also managed to capture biological mechanisms/relationships and special characteristics describing the coastal algal production in Cyprus, like the winter upwelling phenomenon mentioned above. The issues related to ANN’s poor performance because of limited data for learning are discussed in the study of Scardi [
71], where the author suggests that new approaches could enhance ANNs to overcome this problem, like the adaptation of co-predictors. It must be noted that in a previous modelling study, Hadjisolomou et al. [
72] developed a feed-forward ANN, which managed to predict the surface coastal Chl-a levels near Cyprus with a good accuracy (
R=0.87 for the test). However, the data set was much smaller
(n= 681) in comparison with the data set of this modelling study (
n= 1552). For that reason, the previous model was validated by applying the
k-fold method, while the used topology of that ANN was different (9-8-1). As explained by Hadjisolomou et al. [
25], the application of
k-fold method might include some concerns, related to the small data set for testing and therefore the evaluation might become less reliable and robust. Another, important detail related with the nature of the data set, which was analysed in Hadjisolomou et al. [
72], was that only one sporadic measurement with elevated Chl-a value was recorded. As expected, the current ANN created for the needs of this modelling study has better performance (
R=0.97 for the test set), while differences related with the parameter’s sensitivity analysis results are also observed. These differences are mainly attributed to the fact that the current ANN is created based on a data set which contains a significant number of high/elevated Chl-a parameter measurements. Therefore, the current ANN, besides the fact that it performs better, it can generalize better in situations where the algal production is increased. So, the creation of updated ANNs models based on denser measurements and a bigger database would provide even more valuable information and could allow us to better understand the algal production mechanisms.
Often enough, in situ data collected from monitoring campaigns are usually few and demand high economical cost. As it is noted by Xu et al. [
73], the development of ML models based on a small data set, may result into modelling complications (e.g., the model tends to overfit or underfit the data). So, water quality data originated from other data sources (besides monitoring cruises) is an alternative. For example, in their study, Shan et al. [
74] are demonstrating the need for more advanced modelling approaches (like the Long-Short-Term-Memory model) using online monitored data for simulating the complicate patterns of algal growth, since as the authors state no persuasive conclusions can be extracted only from statistical analysis of the data derived from monitoring campaigns. The laborious and costly nature of monitoring campaigns is also emphasized in the study of Silva et al. [
75], where a modified Water Quality Index is developed based on sensor-derived data. Taking into consideration the above considerations and the generally accepted opinion that the ML models are data-hungry [
76], the creation of data-driven models based on data collected from monitoring cruises fused with data from several other available sources (like satellite data, buoy data, historical databases) is recommended. Data-driven models created based on such a hybrid database would enable the development of even more specialized models, able to forecast ahead the Chl-a levels spatially as well as temporally. Additionally, management scenarios for economic activities (e.g., aquaculture) can be studied. For example, in their study, Giangrande et al. [
77] are discussing the integration of different trophic levels with mariculture in the Mediterranean Sea and the role of restoration ecology. In their study, Eze at al. [
78] developed a deep neural network based on real sensor data for the prediction of water quality parameters for a South African Aquaculture Farm.
It is generally accepted that water quality monitoring is a time-consuming and expensive procedure [
51]. Utilizing ANNs for the modelling of water quality parameters is considered the best practice, as compared to other experimental or monitoring methods, which are usually costly or take too long for the data gathering [
79]. In the study by Ahmed et al [
80], the various methods available for estimating the DO concentration are analysed and the authors state that most of these analytical methods are either time-consuming and/or expensive, while the conventional data processing techniques are inappropriate since they are affected by non-linearities; therefore, the above authors are proposing ML data-driven models for water quality modelling prediction purposes. The ML-data-driven models used for prediction are able to overcome modelling limitations related to complex and non-linear data sets, therefore are widely used in water quality modelling [
31,
81]. To summarize, based on the results of our study, it is obvious that the utilization of ANNs for the identification of areas sensitive to eutrophication is of great important to the local authorities and policy makers allowing them to apply measures when needed for the protection of the marine environment, especially in areas where limited scientific knowledge might exist or because data availability/acquisition is difficult.