1. Introduction
Tourism is one of the fastest-growing economic sectors in coastal areas, which attracts sufficient tourists, and brings socioeconomic benefits to human society [
1], due to its convenient transport and natural environment [
2]. For instance, it provided 292 million jobs to 10% of the world’s workforce, which contributes about 10% of the global GDP in 2016, and may expect to be 11.4% by 2027 [
3]. In 2018, a total of about 1.4 billion tourists were recorded globally [
4]. However, natural environmental variability would also cause diverse natural disasters [
5]. For example, flash floods caused 11 death and forced the evacuation of approximately 4,000 tourists in Jordan in November 2018 [
6]. A devastating flash flood caused by heavy rainfall struck Yesanpo, a nature-centered tourist destination near Beijing, leaving over 15,000 visitors trapped overnight in July 2012 [
7].
Globally, flood inundation is recognized as one of the most common natural disasters, and triggers catastrophic property damage and even lives, which has been recorded in the past decades [
8,
9]. Statistically, floods constitute 43% of the total number of natural disasters, and 47% of the amount of weather-related disasters. Floods affected 2.3 billion people and
$662 billion in damage from 1995 to 2015 [
10]. Aproximity 16,000 lives have been lost in flash floods in China between 2000 to 2018, which occupies 74% of the whole flood-relative mortalities [
7]. In 2021, about 400 disastrous events were recorded by the Emergency Event Database, and floods dominated 223 occurrences. The most severe one is the Henan flood in China, causing 352 deaths, 14.5 million people affected, and
$16.5 billion in economic losses (
https://reliefweb.int/report/world/2021-disasters-numbers).
Coastal areas not only are the most developed but also extraordinarily flood-prone places since flood frequency and density in these places are higher than others under extreme climates, such as tropic cyclones and typhoons [3,11-13]. In 2006, super typhoon Sang Mei triggered 153 deaths in Wenzhou, Zhejiang province, bringing about RMB 11 billion in direct economic losses [
14]. In 2013, flood inundation, which was triggered by typhoons, affected eight million residents, and about RMB 33 billion of straight financial losses in Ningbo, Zhejiang province [
15]. Therefore, predicting and understating the potential flood inundation risk (FIR) for tourism in coastal areas is of great importance for regional sustainable development via minimizing possible harm.
Based on the aforementioned, abundant approaches have been used in the flood-tourism field, and make efforts to find suitable ways to mitigate the negative impacts of the flood on tourism. Some used citizen-based activities to improve resilience [
16], and local knowledge is efficient for improving the quality of hazard preparation [
17,
18]. Besides, climate change models connected with socioeconomic data [
19], taxable sales records [
20], etc., were used in estimating economic losses for tourism. Geographic Information Systems (GIS) embrace the advantage to combine models and gridded datasets, such as environmental raster and socioeconomic features [
21]. It is a suitable tool for deriving regional indicators and evaluating their impacts on hotels [
22] and properties [
23], and spatial accessibility in FIR [
24]. GIS was further combined with Remote Sensing (RS), hydrological and hydrodynamic flood simulation models such as FLO-2D [
25], and HAZUS-MH [
26] to assess flood scenarios for tourism facilities [
2,
27]. Besides, some comparatively advanced algorithms in Machine Learning have been successfully used in flood risk evaluation as well, such as Bayesian networks [
28,
29], and the AHP-SA model [
21,
30]. These methods deeply explored the mechanism of flood disasters by integrating multiple factors, such as rainfall, soil, and river. However, difficulties in modeling FIR for tourism across large areas may be encountered due to model complexity and advanced, professional mathematical knowledge. Besides, the computational cost needs to consider for complex models in long-term spatial data evaluation.
Based on the limitations,
Nearest Neighbors (
kNN) was proposed and used to assess FIR for coastal tourism in our previous investigation [
31]. The reasonable result demonstrated
kNN is a simple but efficient computer algorithm since it has no requirement for the distribution of original data, which makes it faster in the process of data training and calculating. Also, it has been widely used in a few studies, such as the classification of missing data, risk evaluation and prediction [31-33]. While
kNN method has some merits, there are some problems needed to be further explored and solved. For example, it is a lazy algorithm and all features with different importance are considered to be equally weighted, which may lead to poor classification performance.
Therefore, this study continually extends our previous kNN-based research investigation for tourism by using distance-weighted methods to improve the evaluation accuracy (EA) and performance of kNN method. Consequently, the aims and innovations of this paper can be summarized as the following points.
Improved the performance of kNN algorithm with a distance-weighted method, and demonstrated the weighed kNN (WkNN) can gain higher accuracy prediction than kNN;
Developed and applied WkNN-based spatial framework with spatial technologies into flood risk assessment for tourism in coastal areas;
Due to the limitation of the spatially gridded data, World Environment Situation Room (WESR) was first used in validating flood risk for coastal areas and was demonstrated WESR can be successfully used in flood risk evaluation.
2. Framework Development
2.1. Basic Principle of kNN
The basic principle of
kNN assumes that examined objects are similar to
nearby sample neighbors, or at least, they have similar characteristics (please further refer to Liu, Liu and Tan [
31])). Based on the basis, there are mainly two steps to classify the categories of the examined objects:
Calculate the pairwise distance between examined objects in testing datasets and k nearby sample neighbors in training datasets;
Vote the categories of the k nearby samples to confirm the classifications of examined objects.
The distance quantified the similarity between examine-and-sample objects. Usually, the lower the distance, the higher the similarity. Many methods were used in
kNN to calculate objects’ distance, such as Manhattan Distance [
34], Minkowski Distance [
35], and Chebyshev Distance [
36]. Among them, Euclidean Distance [
37] is a popular and frequently used method. It refers to the distance between objects in Euclidean space, which can be described as:
Where are the features of examined objects, are the known categories of sample neighbors, and are the distances, and means the number of nearby neighbors.
K values have a significant impact on the classification results of
kNN. Larger
values may cause a complex
kNN model and overfitting results, or a simple model and underfitting results in classification [
38]. Thus, a proper
value may be between two extremes and should be discussed in model building. Traditionally,
neighbors can be found in training datasets that are nearer to an examined object in testing datasets. The category of a testing dataset will be determined by the following classified decision rules:
Where are the predicted categories, is the nearby neighbors, I is the indicator function, that is when , I =1, otherwise, I = 0.
Equations 1 and 2 show that the predicted categories of examined objects are mainly determined by the categories of a majority of samples. However, the weights or importance between examined objects and neighbors are ignored, which makes the classification accuracy lower. Therefore, distance-based weights can be considered to modify and improve the accuracy of kNN.
2.2. Weighted kNN (WkNN)
Weights refer to the importance or contribution of factors to a system. Many approaches can be engaged in calculating weight, such as entropy methods [
39], Analytic Hierarchy Process [
30], and Principal component analysis [
40]. However, these methods get stuck in the complex process of knowledge and calculation. In the study,
kNN’s weight can be simply expressed and calculated using an inverse relationship to Euclidean distance (equation 3), which means the larger the distance, the smaller the weight.
Then equation (2) can be described as:
2.3. Framework Conceptualization
After summarizing similar research investigations, a W
kNN-based spatial framework of FIR assessment for coastal tourism is conceptualized and constructed. The framework can be divided into three parts: input collection, model construction, and output classification (
Figure 1).
The first module mainly collects spatiotemporal data and flood-related index derivation. The data consists of three spatial branches: climate, environment, and validation data. Several indexes are derived from the flood-induced factors which range from mean annual rainfall to drainage density. Flood hazard data of different year return periods (YRP) were collected to create maximum inundation extension (MIE) with historical inundated times which verifies the evaluation results of WkNN model.
The second module is the center part of the framework. Following data collection, all spatial indexes are standardized into datasets with four categories: very low, low, medium, and high risk. The standardized datasets within the extent of MIE were divided into two parts: 70% training dataset and 30% testing dataset. KNN and WkNN were employed to calculate the categories of random records from training datasets with nearby k-training datasets. The inferred results were compared with their existing categories in the training dataset, which produces a confusion matrix and Overall Accuracy (OA). WkNN model with the highest OA value will be extended to whole areas. A sensitivity analysis was conducted to explore the relationship between the inputs and outputs of the model.
The third model is for mapping and evaluating the likelihood of FIR and assessing tourism facilities exposed in FIR.
5. Conclusions
This study develops an innovative spatial framework, which integrates WkNN, GIS, and other flood-relative indices to infer, map, and evaluate the distribution of flood inundation for tourism. The improved WkNN was developed based on kNN by using the weights method which is inversely proportional to distance in GIS. GIS was used as a spatial tool to derive flood-influenced indices via collecting and processing the number of spatial factors at multitemporal, and multispatial resolution from different sources. Among flood-related factors, WESR was used as predicted result validation in FIR for tourism for the first time. The WkNN-based framework was effectively carried out in the case study, obtained reasonable outcomes, and further demonstrated WkNN is superior to kNN in flood risk analysis and evaluation accuracy (EA). Meanwhile, values are still significant parameters for kNN and WkNN. Suitable values will improve the performance of models in EA. The WkNN outcomes can well match WESR data, which can deliver the fundamentals for flood disaster prevention and mitigation for tourism in a coastal area, and assist decision-makers adopt effective measures for preventing and mitigating the negative impacts of flood disasters.
The innovative spatial framework was programmed and repeatable with GIS and R programming, which can be flexibly used in other disaster-related investigations, and also not limited by the number of model inputs. The evaluation results will make corresponding changes responsive to different input indices. However, there are some limitations the study did not consider. For example, due to the limitation of data sources, the study did not fully use Remote Sensing imagery, such as Synthetic Aperture Radar, in flood risk assessment. Besides, the research did not assess the adverse economic consequences of flooding on the tourism industry. As a further step, we plan to probe deeply into these fields and provide more precise assessments.
Author Contributions
For research articles with several authors, the following statements should be used “Conceptualization, R.L. and S.L.; methodology, R.L.; software, N.T.; validation, R.L., S.L. and N.T.; formal analysis, S.L.; investigation, S.L.; resources, S.L.; data curation, N.T.; writing—original draft preparation, S.L.; writing—review and editing, R.L.; visualization, N.T.; supervision, R.L.; project administration, S.L.; funding acquisition, S.L. All authors have read and agreed to the published version of the manuscript.”.