High-Resolution Land Use Land Cover Dataset for Meteorological Modelling – Part 1: ECOCLIMAP-SG+ an Agreement-Based Dataset

Geoffrey Bessardon; Thomas Rieutord; Emily Gleeson; Sandro Oswald; Bolli Palmason

doi:10.20944/preprints202409.0953.v1

Submitted:

11 September 2024

Posted:

12 September 2024

You are already at the latest version

Abstract

ECOCLIMAP-SG+ is a new 60~m land use land cover dataset, which covers a continental domain, and represents the 33 labels of the original ECOCLIMAP-SG dataset. ECOCLIMAP-SG is used in HARMONIE-AROME, the numerical weather prediction model used operationally by Met Éireann and other national meteorological services. ECOCLIMAP-SG+ was created using an agreement-based method to combine information from many maps to overcome variations in semantic and geographical coverage, resolutions, formats, accuracies, and representative periods. In addition to ECOCLIMAP-SG+, the process generates an agreement score map, which estimates the uncertainty of the land cover labels in ECOCLIMAP-SG+ at each location in the domain. This work presents the first evaluation of ECOCLIMAP-SG and ECOCLIMAP-SG+ against the following trusted land cover maps: LUCAS 2022, the Irish National Land Cover 2018 dataset, and an Icelandic version of ECOCLIMAP-SG. Using a set of primary labels, ECOCLIMAP-SG+ outperforms ECOCLIMAP-SG regarding the F1-score against LUCAS 2022 over Europe and the Irish national land cover 2018 dataset. Similarly, it outperforms ECOCLIMAP-SG against the Icelandic version of ECOCLIMAP-SG for most of the represented secondary labels. The score map shows that the quality ECOCLIMAP-SG+ is hetereogeneous. It could be improved once new maps once they become available but we do not control when they will be available. Therefore, the second-part of this publication series aims at improving the map using machine learning.

Keywords:

land cover land use

;

meteorology

;

uncertainty quantification

Subject:

Environmental and Earth Sciences - Atmospheric Science and Meteorology

1. Introduction

To estimate parameters required for calculating turbulent, radiative, heat, and moisture fluxes from the surface of the Earth, NWP models need information describing the Earth’s surface. Physiographic databases are used to provide such information. These databases comprise a LULC map, and a complementary set of geophysical datasets such as leaf area index, albedo, tree heights, and lake depths. A LULC map gathers a set of identifiable features from observations (generally remote-sensing) into classes that the map producer desires. The LULC map is a pivotal element of the physiographic database used in NWP models as it triggers the use of different physical parametrisations and biophysical arrays associated with each LULC class. Thus, the classes of the LULC map used in each NWP model need to address the surface physics requirements of the model.

Meteorological organisations use different LULC maps in their NWP models [1]. These maps vary in the number of land-cover classes and grid size (or resolution), but all aim to represent the sub-grid surface heterogeneity of the NWP model correctly. To proceed, the resolution of the LULC map must be significantly smaller than the NWP model grid spacing. Met Éireann, the Irish Meteorological Service, uses the HARMONIE-AROME canonical model configuration of the shared ALADIN-HIRLAM NWP system for short-range operational weather forecasting with a grid spacing of 2.5 km [2]. The LULC map used with HARMONIE-AROME is the latest version of ECOCLIMAP [3,4] called ECOCLIMAP -Second Generation (ECOSG)1 at 300 m resolution [5], less than eight times the HARMONIE-AROME grid spacing. Over the past few years, sub-kilometer resolution NWP experiments have emerged with grid spacings of up to 100 m [6,7,8]. Thus, the current resolution of ECOSG is insufficient, and using a higher-resolution LULC map with ECOSG labelling is necessary. It is essential to retain the ECOSG labelling to avoid the need to rewrite the surface physics code.

Many high-resolution maps exist [9,10,11,12]. As described in [13], these maps can be global, continental [9,10], national [12] or even local [14], and all have limitations due to spatial coverage, spatial accuracy, semantic accuracy and how up-to-date they are. The ECOSG physiographic database has a land cover with 33 labels, which is much less than the 215 labels of ECOCLIMAP-II [4] but much more than many existing LULC. These limitations mean that no existing high-resolution (less than 100 m) datasets with ECOSG labels are currently available globally, or even for Europe.

Machine learning (ML) is well suited to help tackle these limitations. Previous work on improving ECOSG with ML showed potential for producing a LULC map with simpler labels [1] and for extracting building heights to describe urban areas [15] better. However, until now, these ML techniques could not be applied to identify the 33 ECOSG labels to produce a higher resolution map, as no existing reference dataset was available to be used to train and validate the ML model.

The paper, as part one of a two-part publication, introduces an agreement-based method for producing a 60 m resolution reference LULC map with ECOSG labels, leveraging on existing suitable datasets, a map we will call ECOCLIMAP-SG+ or ECOSG+ hereafter. Moreover, the agreement between maps is used to quantify the uncertainty on the land cover labels, which is rarely provided information, although very useful in some applications. In particular, it can be used to build a reference dataset for machine learning easily. This will be done in the second part of this publication which focuses on improving ECOSG+ where its uncertainty is high. Section 2 presents the datasets and methodology used to produce ECOSG+. The results follow in Section 3 while the conclusions are provided in Section 4.

2. Materials and Methods

ECOSG+ is built by mixing a large number of existing land cover maps based on the agreement between them. This section details which land cover maps were used and how the agreement-based mix is performed.

2.1. Material

2.1.1. Primary and Secondary Labels

In the construction and evaluation of ECOSG+, we use two sets of labels: primary labels,

L_{1}

, and secondary labels,

L_{2}

. The sets

L_{1}

and

L_{2}

are such that for each

l_{2} \in L_{2}

there is a unique

l_{1} \in L_{1}

. The function for the hierarchical link is denoted h as in Equation (1).

h : l_{2} \in L_{2} \mapsto h (l_{2}) = l_{1} \in L_{1}

(1)

Figure 1 gives the sets of labels

L_{1}

and

L_{2}

and the hierarchy between these.

While we aim to create ECOSG+ with the set of secondary labels

L_{2}

, the method starts with

L_{1}

to aid the comparison of maps with heterogeneous semantic coverage. More specifically, some maps focus on primary labels (we refer to these as backbone maps), and some maps focus on some secondary labels (referred to as specialist maps), although the focus is not always exclusive (some maps are both backbone and specialist maps).

2.1.2. Land Cover Maps

The data used in the construction of ECOSG+ is a set of LULC maps. We define a map as a function from a geographical location to a label as in Equation (2).

M : x \in D \mapsto l \in L

(2)

with the following definitions: x is a geographical location,

D

is the geographical domain of definition of the map, l is a land cover label,

L

is the set of labels for this map (i.e., the semantic domain of definition).

For ECOSG+, the geographical domain of definition is the entire globe, denoted

D

. However, most of the maps used in the building of ECOSG+ are focused on Europe. Therefore, the evaluation is only done over Europe, and visualisation is restricted to the EURAT domain (longitudes: -32 to 42 degrees, latitudes: 20 to 72 degrees). The semantic domain of definition of ECOSG+ is the set of secondary labels

L_{2}

.

A total of 43 land cover maps were used in the construction of ECOSG+. They are listed in Table A1–Table A3. For details on the pre-processing and exceptions, we refer to Appendix D.1.

Backbone Maps

We define a backbone map as any map that provides all2 the primary labels

L_{1}

(preferably at a high resolution). We denote the total number of backbone maps available as

N_{b b}

. For any

i \in {1, . . ., N_{b b}}

, the i-th backbone map

M_{b b}^{i}

is, therefore, a function satisfying Equation (3).

M_{b b}^{i} : x \in D_{b b}^{i} \mapsto l_{1} \in L_{1}

(3)

where

D_{b b}^{i}

is the geographical domain of the map

M_{b b}^{i}

. Note that the geographical domain varies from one backbone map to another. In this work,

N_{b b} = 16

backbone maps were used. These are identified as

M_{b b}

in the Table A1–Table A3.

Specialist Maps

We define a specialist map as any map that provides secondary labels. We denote the total number of specialist maps available as

N_{s p}

. For any

j \in {1, . . ., N_{s p}}

, the j-th specialist map

M_{s p}^{j}

satisfies:

M_{s p}^{j} : x \in D_{s p}^{j} \mapsto l_{2} \in L_{2}^{j}

(4)

where

D_{s p}^{j}

is the geographical domain of definition and

L_{2}^{j} \subset L_{2}

is the semantic domain of definition. For backbone maps, the geographical coverage varies from one specialist map to another, as does the semantic domain of definition. In this work,

N_{s p} = 33

specialist maps were used. They are identified as

M_{s p}

in Table A1–Table A3 which also contain the secondary labels indices that are in their semantic domain of definition.

2.2. Methods

2.2.1. Construction of ECOSG+

This section only describes the main steps of the construction, which are original to this work. For technical details and exceptions, please see Appendix D. The input data used for this method are the backbone maps

{M_{b b}^{1}, . . ., M_{b b}^{N_{b b}}}

and the specialist maps

{M_{s p}^{1}, . . ., M_{s p}^{N_{s p}}}

with their definition domains, as introduced in Section 2.1.2.

Definition of a Specialist Agreement Score

For any position

x \in D

and secondary label

l_{2} \in L_{2}

, we define the specialist agreement score,

S_{s p} (x, l_{2})

, as in Equation (5).

S_{s p} (x, l_{2}) = \frac{\sum_{j = 1}^{N_{s p}} ⊮ (x \in D_{s p}^{j} \land l_{2} \in L_{2}^{j} \land M_{s p}^{j} (x) = l_{2})}{\sum_{j = 1}^{N_{s p}} ⊮ (x \in D_{s p}^{j} \land l_{2} \in L_{2}^{j})}

(5)

with

⊮

the indicator function returning 1 if its argument is true and 0 otherwise, and ∧ the logical "and" operator. Therefore, the score

S_{s p} (x, l_{2})

is the ratio between the number of specialist maps that agree with the label

l_{2}

at x versus the number of maps that could provide this information. For example, if we have a position x for which 4 maps can give the secondary label

l_{2}

="19. Winter C3 crops" (i.e.,

\sum_{j = 1}^{N_{s p}} ⊮ (x \in D_{s p}^{j} \land l_{2} \in L_{2}^{j}) = 4

) but only 3 maps actually give this label (i.e.,

\sum_{j = 1}^{N_{s p}} ⊮ (x \in D_{s p}^{j} \land l_{2} \in L_{2}^{j} \land M_{s p}^{j} (x) = l_{2}) = 3

), then we have a specialist agreement score of 0.75 (i.e.,

S_{s p} (x, l_{2}) = 3 / 4

). The specialist agreement score ranges between 0 and 1 and reflects the confidence level on the information provided by the specialist maps: the higher the value of

S_{s p} (x, l_{2})

, the more confident we are that the label

l_{2}

is correct at x. Exceptions to Equation (5) are listed in Appendix D.2.

Refinement of Backbone Maps

For each backbone map, we create a map giving secondary labels instead of primary labels. This process results in a so-called refined map for each backbone map. For any

i \in {1, . . ., N_{b b}}

we write

M_{r f}^{i}

, the refined map of the i-th backbone map, as:

M_{r f}^{i} : x \in D_{r f}^{i} \mapsto \underset{l_{2} \in L_{2}, h (l_{2}) = M_{b b}^{i} (x)}{argmax} S_{s p} (x, l_{2}) \in L_{2}

(6)

with

S_{s p}

the specialist agreement score (see Equation (5)), h the hierarchical link between secondary and primary labels (see Equation (1)) and

D_{r f}^{i} = \{x \in D_{b b}^{i} : max_{l_{2} \in L_{2}, h (l_{2}) = M_{b b}^{i} (x)} S_{s p} (x, l_{2}) \neq 0\}

(7)

The refined map

M_{r f}^{i}

returns the secondary label with the highest specialist agreement score, while satisfying the hierarchical link with the primary label given by the backbone map

M_{b b}^{i}

. For example, let us consider the ESA WorldCover v200 backbone map and a position x where the map gives the primary label "Water bodies" (i.e

M_{b b} (x) =

"Water bodies"). The associated refined map will return whichever of the following secondary labels

l_{2} \in

{"1. Sea and oceans", "2. Lakes", "3. Rivers"} (i.e

h (l_{2}) = M_{b b} (x)

) has the highest

S_{s p} (x, l_{2})

.

The domain of definition of a refined map,

D_{r f}^{i}

, is smaller than that of the corresponding backbone map,

D_{b b}^{i}

, because refinement is not possible everywhere. When the highest specialist agreement score is zero, there are no specialist maps that provide the relevant secondary labels. Therefore, refinement is impossible.

We define a refined agreement score,

S_{r f} (x, l_{2})

, for any position x and secondary label

l_{2}

as in Equation (8).

S_{r f} (x, l_{2}) = \frac{\sum_{i = 1}^{N_{b b}} ⊮ (x \in D_{r f}^{i} \land M_{r f}^{i} (x) = l_{2})}{{max}_{x} \{\sum_{i = 1}^{N_{b b}} ⊮ (x \in D_{r f}^{i})\}}

(8)

Therefore,

S_{r f} (x, l_{2})

is the ratio between the number of refined maps that agree with the label

l_{2}

at x versus the maximum number of overlapping maps. For example, in our case

N_{b b} = 16

, but because the definition domains of some refined maps are mathematically disjoint, no more than 9 refined maps overlap (i.e.,

{max}_{x} \sum_{i = 1}^{N_{b b}} ⊮ (x \in D_{r f}^{i}) = 9

). If we have a position x for which 4 refined maps give the secondary label

l_{2}

="19. Winter C3 crops" (i.e.,

\sum_{i = 1}^{N_{b b}} ⊮ (x \in D_{r f}^{i} \land M_{r f}^{i} (x) = l_{2}) = 4

), then we have a refined agreement score of 0.44 (i.e.,

S_{r f} (x, l_{2}) = 4 / 9

). Note that the denominator is constant. This choice ensures that an area with more available refined maps gets a higher score. Exceptions in the refinement process are listed in Appendix D.3.

Best-Guess Map

The ensemble of refined maps is then used to create a single best-guess map. This is done by taking the label

l_{2}

with the best-refined agreement score

S_{r f} (x, l_{2})

. The resulting map,

M^{*}

, is defined as follows3:

M^{*} : x \in D^{*} \mapsto \underset{l_{2} \in L_{2}}{argmax} S_{r f} (x, l_{2}) \in L_{2}

(9)

with

S_{r f}

the refined agreement score (see Equation (8)) and

D^{*} = \{x \in D : max_{l_{2} \in L_{2}} S_{r f} (x, l_{2}) \neq 0\}

(10)

For the latter, we extend the definition domain of

M^{*}

to the whole globe

D

by inserting the label "0. No data" for all

x \notin D^{*}

. The extended map is also denoted

M^{*} : D \to L_{2}

.

Quality Assessment

The quality of

M^{*} (x)

depends on the refinement process and the construction of the best-guess map. The uncertainties of these steps are represented by the specialist agreement score

S_{s p}

and the refined agreement score

S_{r f}

. For any position x, we define the quality score,

S (x)

, as in Equation (11).

S (x) = \sqrt{S_{r f} (x, M^{*} (x)) S_{s p} (x, M^{*} (x))}

(11)

with the following definitions:

$M^{*}$ is the best-guess map, defined in Equation (9),
$S_{r f}$ is the refined agreement score, defined in Equation (8),
$S_{s p}$ is the specialist agreement score, defined in Equation (5),

The quality score

S (x)

is the geometric mean of the uncertainty caused by disagreement in the backbone maps (represented by their refined counterpart) and the uncertainty due to disagreement in the specialist maps. This is the score we use to estimate the uncertainty of the label given by ECOSG+ at any position x. Note that when

M^{*} (x)

="0. No data", the score

S (x) = 0

because

S_{s p} (x, " 0. No data ") = 0

as defined in Equation (5).

Assembling

The last step in producing ECOSG+ is to assemble the available information, namely: the best-guess map

M^{*}

, the quality score S and the ECOSG map. The assembling of ECOSG+ takes the best-guess map

M^{*}

where the agreement score S is higher than a threshold

S_{m i n}

, and uses ECOSG elsewhere. If we denote the map returning the labels of ECOSG (resp. ECOSG+) by

M_{s g}

(resp.

M_{s g +}

), we have:

M_{s g +} (x) = \{\begin{matrix} M^{*} (x) & if S (x) > S_{m i n} \\ M_{s g} (x) & else \end{matrix}

(12)

The

S_{m i n}

threshold was determined using a histogram of

S (x)

for x covering the EURAT domain at 0.1º resolution. Looking for a gap in the histogram using the Otsu method, the value attained for

S_{m i n}

was 0.525.

2.2.2. Evaluation of ECOSG+

As for all land cover maps, the evaluation of ECOSG+ is made complicated by the heterogeneity of resolutions, semantic, and geographical coverage. Existing methods for evaluating land cover maps include:

Comparison to derived measurable quantities. For example, [16] trained a machine learning model to derive the skin temperature from the land cover, and compared the derived skin temperature to the measured skin temperature. This method allows quantitative evaluation of the land cover maps of large domains but requires measured quantities at a comparable resolution over a comparable domain.
Human validation. For example, LUCAS [17] and CLC+ [18] are validated by human experts. In the case of LUCAS, experts went to designated sites to verify the land cover. In the case of CLC+, experts validated the land cover classes by photo-interpretation. In both cases, human validation requires a sufficient number of trained staff and a carefully designed validation procedure.
Comparison to trusted land cover maps. For example, [1,19] trained and validated a machine learning model using the CORINE land cover map. In this method, the quality of the evaluation is dependent on the quality of the trusted map. It, therefore, requires the existence of a trusted map of proven quality with an appropriate set of labels, and preferably a higher spatial resolution and greater detail than the map being assessed. Although this is considered less accurate than human validation [20], this method validates every pixel on the trusted map domain.

In our case, human validation was not possible because of limitations in time and expertise. The comparison to derived measurable quantities was not investigated because of the lack of measured quantities at 60 m resolution covering significant parts of Europe. Therefore, we chose to perform a comparison with trusted land cover maps used as references.

Reference Maps

The trusted land cover maps used are the following:

LUCAS 2022: In-situ data at validated sites over all of Europe translated to primary labels (see Table A7). A translation to secondary labels is not possible.
NLC 2018: A raster map providing primary labels at 10 m resolution, created by the National Mapping Division of Tailte Éireann (formerly Ordnance Survey of Ireland) in partnership with the Irish Environmental Protection Agency (EPA) and translated to primary labels (see Table A8). A translation to secondary labels is not possible, and the map only covers Ireland.
ECOSGIMO: A raster map at 25 m (provided at 60 m) resolution providing secondary labels, using national datasets and expert rules. Most of the covers over nature come from a habitat classification map [21] from the Icelandic Institute of National History (IINH) based on the EUNIS classification system. The habitat types were translated to secondary labels for Snow, Water bodies, Bare land, Grassland, Crops and Flooded vegetation. The secondary labels for Forests and Shrubs are based on data from the Icelandic Forest Service - Icelandic Forest Research, Mógilsá. Two maps were used, i.e., a map of native birch forests and shrubs, and a map of afforestation with different coniferous and broadleaf species. The urban local climate zone labels come from the CORINE Land Cover 2018 [22] with a few updates in Reykjavík city. Recent lava fields were added as rocks with data from Icelandic Meteorological Institute and other national institutes in Iceland.

None of the reference maps were included in the construction process, which avoids bias in the evaluation. Three components of ECOSG+ are tested with the three reference maps: accuracy across Europe (LUCAS), accuracy of small-scale features (NLC 2018), accuracy of secondary labels (ECOSGIMO). However, these three components are not evaluated together, and each has limitations (only discrete sites in LUCAS 2022, only the geographical region of Ireland in NLC 2018, and only Icelandic secondary labels in ECOSGIMO).

Comparison Scores

To quantify the similarity between two maps, we compute the confusion matrix on the labels4 for all pixels. Then, we compute the overall accuracy (OA) for an overall comparison, and the F1-score for a per-label comparison.

The confusion matrices shown in Section 3 are normalized row-wise by the number of pixels with the row’s label in the reference map. Therefore, the diagonal of these matrices shows the recall (or producer accuracy) value for each label.

Baseline Maps

Once the comparison scores were computed for ECOSG+ against the reference maps, the values of the scores were compared to the ones obtained using baseline maps. The baseline maps considered are ECOSG, ECOSG+300 and ESA WorldCover v200 ([23], hereafter ESA WorldCover).

ECOSG is used as a baseline to quantify the improvement on the land cover map currently used in the HARMONIE-AROME NWP model.
ECOSG+300 refers to ECOSG+ resampled at ECOSG´s native resolution of 300 m. This baseline aims to show whether the improvement is due to the increase in resolution or the correction of some labels.
ESA WorldCover v200 is one of the most commonly used land cover maps, and therefore makes a good standard.

In addition to the quantitative evaluation described here, a qualitative evaluation is also carried out. It consists of visualizing ECOSG+ and its quality score at different scales and calculating basic statistics about the labels.

3. Results

The results are presented as introduced in Section 2.2.2: first, a qualitative evaluation of ECOSG+ and its quality score (Section 3.1), then a quantitative evaluation. The quantitative evaluation involves testing three components of ECOSG+ separately, marked by their reference datasets: a continental scale evaluation is performed with LUCAS 2022 as a reference (Section 3.2.1), a small scale evaluation is performed with NLC 2018 as a reference (Section 3.2.2), and a secondary label evaluation is performed with ECOSGIMO as a reference (Section 3.2.3).

3.1. Qualitative Evaluation

3.1.1. Overview of the ECOSG+ Map and Its Quality Score

Figure 2 shows an overview of the ECOSG+ map (upper panel) and its associated quality score map (lower panel). It is obtained by taking the nearest neighbour value on a regular longitude–latitude grid (EPSG:4326) covering the EURAT domain at 0.1º. The colormap of the land cover labels is the same as for ECOSG. Only labels actually present on the map are listed in the color bar. The colormap of the scores transitions from red (low values, which means low confidence on the label) to green (high values, which means high confidence). The cut-off value in the colormap is the same as the threshold

S_{m i n}

used in assembling the map (see Equation (12)): 0.525. Therefore, the labels of ECOSG+ come from ECOSG where the scores are in red, and they come from the best-guess map

M^{*}

(see Equation (9)) where the scores are in green.

On the land cover map (upper panel of Figure 2), little can be said at this scale, except that no obviously wrong labels were found. We retrieve the expected main features: credible coastlines, a large area of desert in Northern Africa, and a complex tiling of covers everywhere else.

On the score map (lower panel of Figure 2), the main red areas are over the Atlantic Ocean, Eastern Europe and Mediterranean countries. The low scores over the Atlantic Ocean are due to the common practice of removing data over large sea areas. Consequently, despite the low score values, we are confident that the "1. Sea and oceans" label taken from ECOSG over this area is correct. In Eastern Europe, low scores are expected as fewer datasets representing these areas were included in the method. This is also the case for some Mediterranean countries, such as Turkey, Morocco and Algeria. However, some regions are in red, despite having a good coverage of datasets. For example, southern France and Portugal are depicted in red despite using the national land cover maps there, which suggests disagreement between all the datasets used in these areas.

The main green areas are the coastline, the deserts and most European countries. The coastline is usually well represented in all land cover maps, and some of the maps used in this work have a resolution as high as 10 m. Therefore, this explains the good agreement on the coastline. A large number of maps cover most European countries, explaining the high confidence level in these areas. Surprisingly, the deserts are shown in light green, which suggests a satisfactory confidence level, despite the limited number of datasets available. This result can be explained by the limited number of labels covering the deserts ("4. Bare land" and "5. Bare rock"), which reduces the risk of disagreement compared to areas covered by more labels. Therefore, the numerator in the specialist agreement score (see Equation (5)) is most likely high. This high numerator is divided by a small number of maps available, leading to a large score and, consequently, a high confidence level.

Other noticeable features on the score map are the numerous artefacts due to the manipulation of many maps with heterogeneous projections and boundaries. We can see stitching artefacts (regularly spaced red horizontal lines) and reprojection artefacts (patterns in the North and Arctic Seas). Noticeably, these artefacts are not visible on the land cover map, at least at this scale, which is an encouraging result for the ECOSG+ construction method.

3.1.2. Distribution of Labels

Figure 3 represents the distribution of labels over the EURAT domain for ECOSG+ (outer ring) and ECOSG (inner ring). In addition, the proportion of pixels with scores exceeding the

S_{m i n} = 0.525

threshold has been estimated as 33.79%, which means that 33.79% of the pixels of ECOSG+ take their label from a source other than ECOSG. The pie chart names only the most common labels (those with 2% coverage or more). The first visible feature is the similar distribution of labels between ECOSG+ and ECOSG, which is expected at this scale. Therefore, although 33.79% of pixels have changed, the distribution of labels is barely modified, which is an element of ECOSG+ validation. The second visible feature is the strong imbalance among the land cover labels. On the one hand, four labels cover more than 81% of the pixels: "1. Sea and oceans" (52%), "4. Bare land" (17%), "19. Winter C3 crops" (7%) and "12. Boreal needleleaf evergreen" (5%). On the other hand, the 15 least dominant labels cover less than 1% of the pixels. In particular, urban areas (LCZs 1 to 10) represent 0.9% of the pixels.

3.1.3. Zoom on a Few Patches

Figure 4 shows examples of land cover patches for several land cover maps (one per column) at several locations (one per row). Each row represents a geographical area and is identified on the left-hand side by a toponym, the country it is in, and the longitude-latitude coordinates of the central point. The first row is the Snaefell Glacier in Iceland. The second row is centred on Nanterre, France, in the north-western part of the Paris urban area. The third row shows the small islands called Kihdinluoto in the south-west of Finland. The fourth row is a rural part of Portugal, around the small town of Pinhel. The fifth and last row is the oasis town of El Menia in the Sahara desert (Algeria). The geographical areas have been chosen to display various landscapes and latitudes within the EURAT domain. All patches are 0.0833º in size, representing approximately 8 km at low latitudes.

Each column represents a different land cover map. The first column is ESA World Cover, one of the backbone maps with global coverage and 10 m resolution, shown here to verify the primary labels. The second column is ECOSG, currently used in NWP models and is the baseline to improve upon. The colormap for the ECOSG labels is the same as in Figure 1, Figure 2 and Figure 3. The third column is ECOSG+,

M_{s g +}

(see Equation (12)), the final map of this work. The fourth column is the best-guess map,

M^{*}

(see Equation (9)). The fifth and last column is the ECOSG+ quality score map, S (see Equation (11)), used in the assembling of ECOSG+. The colormap for the score values is the same as in Figure 2, with green indicating where ECOSG+ takes values from

M^{*}

instead of ECOSG.

Figure 4 illustrates the gain in resolution between ECOSG and ECOSG+ and the label correction (e.g., over Paris –line 2). The best-guess map shows even more pronounced differences with ECOSG very often in agreement with ESA WorldCover, which is good. However, some missing values remain where the score is 0 (deep red, e.g., the crops over Portugal –line 4– or the Algerian lake and town –line 5). This justifies the need for a cut-off value for the score. As ECOSG is taken where the score is below the cut-off value, the areas with the lowest scores also have a lower resolution.

3.2. Quantitative Evaluations

3.2.1. Europe-Wide Evaluation against LUCAS

The ESA WorldCover, ECOCLIMAP-SG map and the new ECOSG+ map were evaluated over the European Union using the LUCAS 2022 dataset [17]. In this comparison, the LUCAS points were expanded to a 60 m radius, and the LUCAS labels were translated according to the C3 classification and Table A7. To rule out resolution changes as a factor for the differences between ECOSG and ECOSG+, ECOSG+ was downsampled to 300 m (ECOSG+300) using the most frequent land cover type in each 300 m square grid.

Figure 5 shows the confusion matrices of ECOSG, ESA WorldCover, ECOSG+ and ECOSG+300. Primary labels of the reference dataset (LUCAS) are on the y-axis and those of the evaluated dataset are on the x-axis. To compensate for label imbalance, the matrices have been normalised row-wise by the number of pixels per label in the reference dataset. The figure also shows each map’s associated overall accuracies (OA). For every dataset, "Forest", "Grassland", and "Crops" display the most off-diagonal spread in values, with a clear confusion between "Bare land" and "Crops". ECOSG+, ECOSG+300 and ESA WorldCover exhibit similar behaviour with higher values on the diagonal for "Grassland", and more "Grassland" versus "Shrubs" confusion than in ECOSG. The OA reflects these two groups with ECOSG+, ECOSG+300 and ESA WorldCover having a value of over 0.5, while ECOSG has a value of 0.42. These observations suggest that the agreement-based method improved the overall representation of primary labels, and these observations are not only due to the resolution increase, as ECOSG+300 exhibits similar behaviour to ECOSG+.

Table 1 displays the F1-score for ECOSG, ESA WorldCover, ECOSG+ and ECOSG+300 with LUCAS 2022 as a reference. ECOSG+ has the highest F1-score for "Bare land", "Snow", and "Flooded vegetation", and outperforms ECOSG for every label. While the impact of the downsampling leads to lower F1-scores, ECOSG+300 outperforms ECOSG for every label except "Shrubs". This shows that ECOSG+ improves the overall representation of ECOSG primary labels over Europe.

3.2.2. Small Scale Feature Evaluation against NLC 2018

In Section 3.2.1, we assessed the overall representation of the primary labels over Europe. In this section, we compare ECOSG+ against a dataset for Ireland to assess whether ECOSG+ represents the high-resolution detail well. Produced by the EPA and Taillte Ireland, NLC 2018 covers the Republic of Ireland. It has two classification levels, with thematic accuracies of 78.5% and 88.7% at Level 2 and Level 1, respectively, while the geometric accuracy (i.e., the area outline) is 87.2%.

To perform the analysis, the NLC 2018 data were rasterised on a 60 m grid and converted to primary labels following Table A8. We note that the NCL 2018 labels are not a perfect fit for primary labels, especially for the Flooded vegetation types. Thus, an assessment of the NLC 2018 primary label conversion is needed. This assessment was performed over Ireland using LUCAS as a reference. The assessment, presented in Appendix E, has determined that NLC 2018 is a suitable reference map for the Republic of Ireland after being transformed into primary labels. However, it has been observed that NLC 2018 classifies "Shrubs" as "Flooded vegetation", meaning that the outcomes for these two primary labels should be viewed with caution.

Figure 6 shows the row-wise normalised confusion matrices for ECOSG+, ECOSG and ESA WorldCover with NLC 2018 as a reference. The dark blue column over grassland indicates the large spread in the grassland classification, which is consistent with the LUCAS confusion matrix (Appendix E). ECOSG+ and ESA WorldCover classify most of the NLC 2018 "Flooded vegetation" as "Grassland", while there is good agreement between ECOSG and NLC 2018 for flooded grassland. As "Flooded vegetation" is overestimated in NLC 2018, the relative underestimation in ECOSG+ and ESA WorldCover suggests a better representation of "Flooded vegetation" than in ECOSG. The forest land cover in ECOSG+ and ESA WorldCover corresponds well to NLC 2018 forest, while ECOSG overestimates "Grassland".

Table 2 shows the F1-scores over the Republic of Ireland for ECOSG+, ECOSG and ESA WorldCover, using NLC 2018 as a reference. ECOSG+ has the highest F1-score, ahead of ESA WorldCover, for "Forests", "Grassland", "Crops" and "Urban" areas. Meanwhile, ECOSG is a better fit for "Bare land" and "Flooded vegetation", which were the least accurate in NLC 2018 compared to LUCAS (Appendix E). This suggests that the agreement-based method correctly captures most primary labels over Ireland, even when compared to a higher resolution reference. The quality of the "Bare land", "Shrubs" and "Flooded vegetation" labels remains questionable in all tested maps.

3.2.3. Secondary Label Evaluation against ECOSGIMO

In the previous sections, we evaluated ECOSG+ primary labels over Europe and its level of detail over the Republic of Ireland. This comparison is incomplete, as this work aims to create a reference dataset with the 33 secondary labels. An evaluation at a secondary label level is necessary. In this section, we will compare ECOSG+ secondary labels with a high-resolution version of ECOSG called ECOSGIMO. ECOSGIMO is the only high-resolution version of ECOSG available. However, ECOSGIMO does not have an accuracy assessment and its coverage over Iceland is outside the LUCAS domain, or that of any known in-situ datasets. Hence, we cannot verify the accuracy of ECOSGIMO. Rather than providing an absolute assessment of the ECOSG+ secondary labels, this section aims to demonstrate how closely the ECOSG+ agreement-based method can approximate a more traditional technique.

Figure 7 shows the row-wise normalised confusion matrices for ECOSG+ and ECOSG with ECOSGIMO as a reference. For visibility purposes, labels not represented over Iceland were removed from Figure 7. Therefore, we can only evaluate 20 labels. The overall accuracies are similar and high compared to the previous section due to the imbalance of label distribution in the Icelandic domain. Indeed, 48% of the domain is covered by "1. Sea and oceans" and "6. Permanent snow", and the dark blue pixel on the diagonal for these labels reveals a strong agreement of ECOSG+ or ECOSG with ECOSGIMO, which increases the proportion of correctly represented labels.

The dashed squares in Figure 7 indicate the primary labels, and the diagonal in the "Water bodies" square indicates that Water bodies in both ECOSG+ and ECOSG agree with ECOSGIMO at a primary level. Still, at a secondary level, rivers in ECOSG+ are better represented compared to in ECOSG. In the "Bare land" square, ECOSG+ agrees more at a primary level but is more confused at a secondary level. For "Forests", "Shrubs" and "Grassland", both maps overestimate "Grassland", but ECOSG+ preferentially returns "16. Boreal grassland" while ECOSG preferentially returns "17. Temperate grassland" and the latter is the most frequent in ECOSGIMO, which explains a significant part of the overall accuracy gap between ECOSG and ECOSG+. For "Flooded vegetation", ECOSG is in good agreement with ECOSGIMO, while ECOSG+ overestimates "Grassland". For "Urban", ECOSG confuses sparsely built labels with "Grassland" compared to ECOSG+. The observed underestimation of sparsely built areas in ECOSG is consistent with previous studies, and ECOSG+ seems to resolve this issue partially.

Table 3 and Table 4 show the F1-score of ECOSG and ECOSG+ over Iceland for primary and secondary labels with ECOSGIMO as a reference. ESA WorldCover was added at the primary level for information. ECOSG+ has the highest score at the primary level, except for "Forest", "Shrubs", and "Flooded vegetation". Interestingly, ECOSG+ has the worst score for "Shrubs", which is also a secondary label, consistent with other analyses showing that "Shrubs" are still problematic in ECOSG+. At the secondary level, ECOSG+ scores better than ECOSG for 14 labels, while ECOSG has a better score for 4. Neither map identifies "13. Boreal needleleaf deciduous" and "28. LCZ5: open midrise" explaining the NaN values.

Among the labels where ECOSG+ is better than ECOSG in terms of the F1-score, for 8 of the labels the gap between the two scores is more than 0.1. Of the 4 labels, ECOSG has a better F1-score than ECOSG+. The gap between the two scores is more than 0.1 for two labels: "4. Bare land", "17. Temperate grassland". For "4. Bare land ", the low F1-score is due to a distribution issue between the "Bare land" primary label and the two secondary labels "4. bare land" and "5. bare rock". This issue is due to the limited number of specialist maps distinguishing "4. Bare land" and "5. Bare rock". The bioclimatic distribution between the grassland primary label and the "16. Boreal grassland" and "17. Temperate grassland" secondary labels seems to be off in Figure 7 explaining the relatively low boreal grassland F1-score. This shows the limits of solely relying on [24] for the bioclimatic classification (see Appendix D.3 for more detail).

4. Conclusions

In this work, we presented an agreement-based method for producing ECOSG+, an LULC map at 60 m with ECOSG labels. ECOSG+ results from the assembly of ECOSG and a so-called best-guess map

M^{*}

representing the most probable ECOSG labels combining information from many suitable datasets. The quality score S represents the confidence level of

M^{*}

and defines the ECOSG+ assembly rules. A threshold value of 0.525 was defined for

S_{m i n}

using the Otsu method, to define pixels where

M^{*}

replaces ECOSG in ECOSG+.

Although 33.79% of the resulting ECOSG+ map pixels are sourced from

M^{*}

, ECOSG+ keeps a similar label distribution to the original ECOSG. A qualitative evaluation of ECOSG+ shows the gain in resolution between ECOSG+ and ECOSG and the label correction. The qualitative evaluation also exhibits the limits of

M^{*}

, with some missing values, and therefore the need for the cut-off value

S_{m i n}

despite losing higher resolution information.

We performed the first quantitative evaluation of ECOSG, in parallel with ECOSG+, against LUCAS 2018, NLC 2018 and ECOSGIMO. The evaluation against LUCAS revealed the superiority of ECOSG+ across Europe for every primary label at the LUCAS sites. The downsampling of ECOSG+ to 300 m (ECOSG+300) showed that the improvements between ECOSG+ and ECOSG are not only due to resolution. The most improved labels are "Forest" and "Grassland" while "Bare land" and "Shrubs" have small improvements and are the least well-represented labels.

The evaluation of small-scale features against NLC 2018 revealed that ECOSG+ improves upon ECOSG for every label except "Bare land" and "Flooded vegetation", for which the NLC 2018 translation to primary labels has been identified as being less reliable. "Shrubs" were confirmed as poorly represented in every map.

The evaluation of secondary labels against ECOSGIMO showed that ECOSG+ is superior for 14 of the 20 labels represented in Iceland. However, two secondary labels were inferior: "4. Bare land", "17. Temperate grassland". While their respective primary labels, "Bare land" and "Grassland", were superior in ECOSG+, the lack of a specialist dataset for "4. Bare land" and the sole reliance on the [24] distribution for the bioclimatic distribution showed the limits of the method.

The evaluation showed limitations in representing "Bare land", "Shrubs", and "Flooded vegetation". Therefore, implementing new specialist maps covering these labels would greatly benefit these aspects of ECOSG+. The classification of "Urban" areas into local climate zones needs further evaluation. Due to the limited number of datasets explicitly representing local climate zones, the method could be improved by looking at the agreement based on building height and density, similar to what was done for trees (see Appendix D.3). The current study focused on the EURAT domain. However, the method to create ECOSG+ is flexible enough to implement other datasets and to expand ECOSG+ and its verification domain beyond EURAT.

This study showed that ECOSG+ outperforms ECOSG at the primary level regarding overall accuracy and F1-score per label. Still, it would have been desirable to use a larger domain with more secondary labels to ensure the quality of all 33 labels. One way to assess this product could be to compare it to derived measurable quantities.

However, the aim of ECOSG+ is not only to provide an accurate land cover map but also a quality score on the given land cover. Such information is rarely provided along with LULC maps although it enables further use of the LULC information. For example, the quality score can be used to build a trustworthy reference dataset in machine learning applications. This work is done in [25], where ECOSG+ trains the AI model to replace the areas with the lowest S scores.

As land cover is only one aspect of the ECOCLIMAP-SG physiography database, additional work will be needed to ensure the complementarity between land cover and geophysical parameters such as leaf area index, albedo, tree heights, and lake depths.

Author Contributions

All authors contributed to writing the paper. G.B. implemented and ran the method, producing the ECOSG+ map. The figures were prepared by G.B. and TR. B.P. created ECOSGIMO used for the evaluation of ECOSG+. S.O. created ESAGHSurban.

Funding

This research received no external funding.

Data Availability Statement

ECOSG+, the best-guess map

M^{*}

and the quality score map S are available at https://doi.org/10.5281/zenodo.10944693 [26] where each map is a zip file containing 100 .tif tiles covering the EURAT domain (longitudes: -32 to 42 degrees, latitudes: 20 to 72 degrees). Each .tif file has a resolution of 60m. The zip files are named as follow:

quality_score_map.zip: the quality score map S
best-guess_map.zip : the best-guess map $M^{*}$
ecosg_plus.zip: ECOSG+

Acknowledgments

We would like to thank all the mapping agencies for providing the datasets used to create ECOSG+. Eurostat for the LUCAS 2022 dataset, EPA and Tailte Éireann for providing the NLC 2018 map under the National mapping agreement.

Conflicts of Interest

The contact author has declared that none of the authors has any competing interests.

Abbreviations

The following abbreviations are used in this manuscript:

ECOSG	ECOCLIMAP-SG: a physiography database currently used in NWP
ECOSG+	ECOCLIMAP-SG+: the land cover map described in this manuscript
EURAT	Europe-Atlantic domain (longitudes: -32 to 42, latitudes: 20 to 72)
LCZ	Local Climate Zone
LULC	Land Use Land Cover
NWP	Numerical Weather Prediction

Appendix A. Tables of Land Cover Datasets Used in the Creation of ECOSG+

Table A1. Land Cover Datasets used in the creation of ECOSG+ (1/3)

Name	Reference	Resolution	AOI	Usage in ECOSG +
CALC2020	[27]	10 m	circumpolar Arctic	$M_{b b}$
CGLSLC100	[28]	100 m	global	$M_{s p}$ ( $l_{2}$ = 7, 8, 9, 10, 11, 12, 13, 14)
CGLSLC100F	[28]	100 m	global	$M_{s p}$ ( $l_{2}$ =16, 17, 18)
ESAGHSurban	ESA WorldCover and GHS-BUILT-S according to Table A5	10 m	world	$M_{b b}$
ESAWorldcereal	[29]	10 m	world	$M_{s p}$ ( $l_{2}$ =19, 20, 21)
ESA WorldCover	[23]	10 m	world	$M_{b b}$
ESRI2020	[11]	10 m	world	$M_{b b}$
FROMGLC10	[30]	10 m	world	$M_{b b}$
GHS-BUILT-C	[31]	10m	global	$M_{s p}$ ( $l_{2}$ =31)
GHS-BUILT-S	[32]	10 m	global	$M_{b b}$ (ESAGHSurban)
GLCZ	[33]	100 m	global	$M_{s p}$ ( $l_{2}$ =4, 5,15,16, 17, 18, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
GLC_FCS302020	[34]	30 m	world	$M_{b b}$ , $M_{s p}$ ( $l_{2}$ =4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
GRWLwatermask	[35]	30 m	global	$M_{s p}$ ( $l_{2}$ =1,2,3)
GWL_FCS30	[36]	30 m	world	$M_{s p}$ ( $l_{2}$ =22, 23)
Hydrolakes	[37]	MMA: 10 ha	world	$M_{s p}$ ( $l_{2}$ =2)
OSMsurfacewater	[38]	90 m	world	$M_{s p}$ ( $l_{2}$ =1,3)

Table A2. Land Cover Datasets used in the creation of ECOSG+ (2/3)

Name	Reference	Resolution	AOI	Usage in ECOSG +
CLCplus	[18]	10 m	Europe	$M_{b b}$ , $M_{s p}$ ( $l_{2}$ =7,8, 9, 10, 11, 12, 13, 14,16,17,18)
Coastal2018	[39]	Vector MMU: 0.5 ha MMW:10 m	Europe Coastal areas	$M_{s p}$ ( $l_{2}$ =1, 2, 3, 4, 5, 6, 31, 32)
ELC10	[10]	10 m	Europe	$M_{b b}$
EUCROPMAP	[40]	10 m	Europe	$M_{s p}$ ( $l_{2}$ =19,20,21)
EUMAPOSMgrass	[41]	30 m	Europe	$M_{s p}$ ( $l_{2}$ =16,17,18)
EUMAPlandcover	[42]	30 m	Europe	$M_{s p}$ ( $l_{2}$ = 4, 5, 6, 15, 23 )
EUSALP	[43]	up to 5 m	European Alps Macro region	$M_{s p}$ ( $l_{2}$ =4, 5, 6, 31)
EUhydrocoastline	[44]	Vector MMU: 1 ha	Europe	$M_{s p}$ ( $l_{2}$ =1)
Geoclimate	[45]	Vector MMW: 60 m	Run over multiple large urban area across Europe	$M_{s p}$ ( $l_{2} =$ 16, 17, 18, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
GRA2018	[46]	10 m	Europe	$M_{s p}$ ( $l_{2}$ =16, 17, 18)
IMD2018	[47]	10 m	Europe	$ϕ$ $l_{2} = 4, 5$
N2K2018	[48]	Vector MMU: 0.5 ha MMW:10 m	Europe Natura 2000 zones	$M_{s p}$ ( $l_{2}$ =1, 2, 3, 4, 5, 6, 31, 32)
OpenEuroRegionalCoast OpenEuroRegionalIce OpenEuroRegionalLake OpenEuroRegionalRailrdL OpenEuroRegionalRoadL1 OpenEuroRegionalRoadL2 OpenEuroRegionalSea OpenEuroRegionalSoilcrs OpenEuroRegionalWatercrs OpenEuroRegionalWatercrsL	[49]	Vector data1:25000 missing linear small scale features	Europe	$M_{s p}$ ( $l_{2}$ =1, 2, 3, 4, 5, 6, 31, 32)
RPZ2018	[50]	Vector MMU: 0.5 ha MMW:10 m	Europe riparian zones	$M_{s p}$ ( $l_{2}$ =1, 2, 3, 4, 5, 6, 31, 32)
S2GLC	[9]	10 m	Europe	$M_{b b}$ , $M_{s p}$ ( $l_{2}$ =7,8, 9, 10, 11, 12, 13, 14)

Table A3. Land Cover Datasets used in the creation of ECOSG+ (3/3)

Name	Reference	Resolution	AOI	Usage in ECOSG +
COSc2020	[51]	10 m	Portugal	$M_{b b}$
Icelandhabitat	[21]	1:25,000	Iceland	$M_{b b}$ , $M_{s p}$ ( $l_{2}$ =1, 2, 3)
MACATECOSG	MACAT and COSc2020 using Table A6	10 m	Portugal	$M_{s p}$ ( $l_{2}$ =19, 20, 21)
NLCSweden2018	[52]	10 m	Sweden	$M_{b b}$
OCS2020	[53]	10 m	Metropolitan France	$M_{b b}$ , $M_{s p}$ ( $l_{2}$ =4, 5,), grassland, Shr...
WFDCanalIE	[54]	Vector data 1:50000	Ireland	$M_{s p}$ ( $l_{2}$ =3)
WFDCoastalIE	[55]	Vector data 1:50000	Ireland	$M_{s p}$ ( $l_{2}$ =1)
WFDLakeIE	[56]	Vector data 1:50000	Ireland	$M_{s p}$ ( $l_{2}$ =2)
WFDRiverIE	[57]	Vector data 1:50000	Ireland	$M_{s p}$ ( $l_{2}$ =3)
WFDTransitionalIE	[58]	Vector data 1:50000	Ireland	$M_{s p}$ ( $l_{2}$ =1)

Table A4. Land cover included in the code but outside the EURAT domain

Name	Reference	Resolution	AOI	Usage in ECOSG +
NALCMS2020	[59]	30 m	North America	$M_{b b}$
NLCD2019	[60]	30 m	USA	$M_{b b}$ , $M_{s p}$ broadleaf_deciduous, broadleaf_evergreen

Appendix B. Conversion Tables for Particular Cases

Table A5. ESAGHSurban map creation rules. The columns are inclusive i.e the urban label corresponds to pixels with the ESA WorldCover value or GHS-BUILT-S.

ESAGHSurban Primary label	ESA WorldCover labels	GHS-BUILT-S fraction
Water bodies	80 Permanent Water bodies	Not used
Bare land	60 Bare / sparse vegetation	Not used
Snow	70 Snow and ice	Not used
Forest	10 Tree cover	Not used
Shrubs	20 Shrubland	Not used
Grassland	30 Grassland	Not used
Flooded vegetation	90 Herbaceous wetland; 95 Mangroves; 100 Moss and lichen	Not used
Urban	50 Built-up	> 5% of built up surfaces

Table A6. Rules to create MACATECOSG using MACAT and COSc2020. The columns are exclusive, i.e., the MACATECOSG secondary labels require both COSc2020 and MACAT conditions to be met.

MACATECOSG Secondary label	COSc2020 label	MACAT labels
19. Winter C3 Crops	211 Culturas anuais de outono/inverno (winter crops)	1101, 1203, 1305 Aveia (Oat); 1102, 1204, 1306 Azevém (Ryegrass); 1103, 1301, 1307 Trigo (Wheat); 1104, 1302, 1308 Triticale (Triticale); 1105, 1303, 1309 Centeio (Rye); 1106, 1304, 1310 Cevada (Barley); 1311 Courgete (Zucchini); 1312 Pimento (Pepper); 1401 Tremocilha (Lupini beans); 1402 Ervilha (Pea); 1403 Grão de bico (Chickpea); 1404 Fava (Fava) ; 1405 Trevo (Clover); 1406 Feijão (Bean); 1407 Tremoço (Lupine); 1409 Ervilhaças (Peas)
20. Summer C3 Crops	212 Culturas anuais de primavera/verão (summer crops)	1101, 1203, 1305 Aveia (Oat); 1102, 1204, 1306 Azevém (Ryegrass); 1103, 1301, 1307 Trigo (Wheat); 1104, 1302, 1308 Triticale (Triticale); 1105, 1303, 1309 Centeio (Rye); 1106, 1304, 1310 Cevada (Barley); 1311 Courgete (Zucchini); 1312 Pimento (Pepper); 1401 Tremocilha (Lupini beans); 1402 Ervilha (Pea); 1403 Grão de bico (Chickpea); 1404 Fava (Fava) ; 1405 Trevo (Clover); 1406 Feijão (Bean); 1407 Tremoço (Lupine); 1409 Ervilhaças (Peas)
21. C4 Crops	211 Culturas anuais de outono/inverno (winter crops); 212 Culturas anuais de primavera/verão (summer crops); 213 Outras áreas agrícolas (other crops)	1201 Milho (Corn); 1202 Sorgo(Sorghum)

Appendix C. Conversion Tables Used for the Evaluation

Table A7. LUCAS C3 classification conversion to primary labels

Primary label	LUCAS C3 code and name
Water bodies	G10 Inland Water bodies, G11 Inland fresh Water bodies, G12 Inland salty Water bodies, G20 Inland running water, G21 Inland fresh running water, G22 Inland salty running water, G30 Transitional Water bodies, G40 Sea and ocean
Bare land	F10 Rocks and stones, F20 Sand, F40 Other bare soil
Snow	G50 Glaciers, Permanent snow
Forest	C, C1 , C10 Broadleaved woodland, C2, C20 Coniferous woodland, C21 Spruce dominated coniferous woodland, C22 Pine-dominated coniferous woodland, C23 Other coniferous woodland, C30 Mixed woodland, C31 Spruce dominated mixed woodland, C32 Pine dominated mixed woodland, C33 Other mixed woodland
Shrubs	D, D1, D10 Shrubland with sparse tree cover, D2, D20 Shrubland without tree cover
Grassland	E, E1, E10 Grassland with sparse tree/shrub cover
	E2, E20 Grassland without tree/shrub cover
	E3, E30 Spontaneously re-vegetated surfaces
Crops	every B code from B00 Cropland to B84 Permanent industrial crops
Flooded vegetation	H, H10 Inland wetlands, H11 Inland marshes, H12 Peatbogs, H20 Coastal wetlands, H21 Salt marshes, H22 Salines and other chemical deposits, H23 Intertidal flats, F3, F30 Lichens and moss
Urban	A00 Artificial land, A1, A10 Roofed built-up areas, A11 Buildings with one to three floors, A12 Buildings with more than three floors, A13 Greenhouses, A2, A20 Artificial non-built up areas, A21 Non built-up area features, A22 Non built-up linear features, A30 Other Artificial Areas

Table A8. Conversion of NLC 2018 to ECOSG primary labels

Primary label	NLC 2018 code and label
Water Bodies	810 Rivers and Streams, 820 Lakes and Ponds, 830 Artificial Water bodies, 840 Transitional Water bodies, 850 Marine Water
Bareland	210 Exposed Rock and Sediments, 220 Coastal Sediments, 230 Mudflats, 240 Bare Soil and Disturbed Ground, 250 Burnt Areas
Snow	None
Forest	410 Coniferous Forest, 420 Mixed Forest, 430 Transitional Forest, 440 Broadleaved Forest and Woodland 470 Treelines
Shrubs	450 Scrub, 460 Hedgegrows
Grassland	510 Improved Grassland, 520 Amenity Grassland, 530 Dry Grassland
Crops	310 Cultivated Land
Flooded vegetation	540 Wet Grassland, 550 Saltmarsh, 570 Swamp, 610 Raised Bog, 620 Blanket Bog, 630 Cutover Bog, 640 Bare Peat, 650 Fens, 710 Bracken, 720 Dry Heath, 730 Wet Heath
Urban	110 Buildings, 120 Ways, 130 Other Artificial Surfaces

Appendix D. Exceptions and Special Cases in the Construction of ECOSG+

Appendix D.1. Exceptions in the Land Cover Maps

Pre-processing is applied and consists of uploading all required maps onto GEE and reprojecting these onto the GEE default grid (see [61] for details). Most of the data are hosted in the GEE data catalogue [62] and the so-called "awesome" GEE community catalogue [63]. Additional datasets not in these catalogues, such as CLC+, were manually uploaded to GEE. Another reprojection is made when exporting the ECOSG+ map and its quality estimation in EPSG:4326 at 0.000539° resolution (approximately 60 m). These operations are dealt with by GEE commands that are beyond the scope of this paper. We refer the reader to Wu [61], Wu et al. [64] for comprehensive documentation on the Python API and the geemap python package.

Some particular cases in Table A1–Table A4 are noteworthy:

The Geoclimate dataset consists of a map of LCZ obtained by running the Geoclimate tool [45] on the main European urban areas.
The Copernicus Imperviousness HRL does not provide secondary labels but distinguishes (secondary labels "4. Bare land", "5. Bare rocks" ) and concrete runways ("31. LCZ8: large low-rise") (see Appendix D.2). We extract the artificial imperviousness density, denoted $ϕ (x)$ , from this dataset.
ESAGHSurban is a combination of ESA WorldCover v200 and the GHS built-up surface (GHS-BUILT-S) converted to primary labels, where GHS-BUILT-S was resampled to the target grid, and missing urban areas have been added from ESA WorldCover following Table A5
MACATECOSG is a merge of the Portuguese DGterritorio MACAT and COSC 2020 [51] maps. Using the rules in Table A6 this merge locates the labels "19. Winter C3 Crops", "20. Summer C3 Crops", and "21. C4 crops" over Portugal.

Appendix D.2. Exceptions in the Specialist Agreement Score

Exceptions to Equation (5) are made in the following cases:

$l_{2} =$ "0. No data". The specialist agreement score is set to 0 everywhere for this label.
Null denominator: $\sum_{j = 1}^{N_{s p}} ⊮ (x \in D_{s p}^{j} \land l_{2} \in L_{2}^{j}) = 0$ . When no specialist map provides the label $l_{2}$ at x, the specialist agreement score is set to 0.
$l_{2} \in$ {"4. Bare land", "5. Bare rocks"}. Preliminary experiments showed that confusion often occurs between sand, rocks (secondary labels "4. Bare land", "5. Bare rocks" ) and concrete runways ("31. LCZ8: large low-rise"), which can be disambiguated thanks to the artificial imperviousness density. Therefore, maps providing the labels "4. Bare land " and "5. Bare rocks" see their score penalised by the imperviousness density (sand and rocks with high imperviousness is likely to be wrong).

Therefore, a more general (but more complicated) formula for the specialist agreement score is:

S_{s p} (x, l_{2}) = α (x, l_{2}) (\sum_{j = 1}^{N_{s p}} ⊮ (x \in D_{s p}^{j} \land l_{2} \in L_{2}^{j} \land M_{s p}^{j} (x) = l_{2})) + β (x, l_{2})

(A1)

with the following coefficients:

α (x, l_{2}) = \{\begin{matrix} 1 / N_{s p}^{'} (x, l_{2}) & if N_{s p}^{'} (x, l_{2}) > 0 and l_{2} \neq " 0. no data " \\ 0 & else \end{matrix}

(A2)

β (x, l_{2}) = \{\begin{matrix} - ϕ (x) & if l_{2} = " 4. bare land ", " 5. bare rocks " \\ 0 & else \end{matrix}

(A3)

where

ϕ (x)

is the artificial imperviousness density (ranging between 0 and 1, with 1 being totally human-made impervious ground).

Appendix D.3. Exceptions in the Refinement Process

The refinement process as described in Section 2.2.1 suffers three exceptions: a joint maxima in score, the labels with bioclimatic classification and the forest primary label.

Joint Maximum in Score

In Equation (6), in the case of a joint maximum in the specialist agreement score, the argmax returns the one with the lowest label number. For example, if we have a position x, a backbone map

M_{b b}^{i}

giving

M_{b b}^{i} (x) =

"Crops", and a joint maximum in the specialist agreement score (e.g

S_{s p} (x,

"21. C4 crops"

) = S_{s p} (x,

"20. Summer C3 crops"

) > S_{s p} (x,

"19. Winter C3 crops")), then the refined map

M_{r f}^{i} (x)

will return "20. Summer C3 crops" because it has the lowest label number in the joint maximum. This default behavior is arbitrary, as no better solution was found for now.

Bioclimatic Classification

Bioclimatic classification (i.e "boreal", "temperate", or "tropical") is used in the "Forest" and "Grassland" primary labels. However, few datasets provide the bioclimatic class. Therefore, the secondary labels that differ only by their bioclimatic class are distinguished following the Beck et al. [24] bioclimatic map and the score is calculated independently of its bioclimatic class. For example, "8. Temperate broadleaf deciduous" and "9. Tropical broadleaf deciduous" are two types of "broadleaf deciduous". They will have the same score values:

S_{s p} (x,

"8. Temperate broadleaf deciduous"

) = S_{s p} (x,

"9. Tropical broadleaf deciduous") for all x.

The Forest Primary Label

The "Forest" primary label includes secondary labels having a more regular structure than the others primary labels. Indeed, the seven classes of forest are a combination of

The bioclimatic classification (i.e "boreal", "temperate", or "tropical")
The dominant leaf type (i.e "broadleaf", "needleleaf")
The leaf cycle (i.e "deciduous", "evergreen")

Therefore, the secondary labels that fall under the "Forest" primary label are obtained using a deeper hierarchy of labels than other primary labels. The bioclimatic class is given by Beck et al. [24], as explained in the previous paragraph. The dominant leaf type and the leaf cycle are jointly taken from datasets specific to these information. The refinement score

S_{r f}

(see Equation (8)) is then calculated ignoring the bioclimatic class.

Appendix E. Limitations of NLC 2018: Comparison against LUCAS

Figure A1 shows the normalised confusion matrix of ECOSG, ESA WorldCover, ECOSG+ and NLC 2018 for the Republic of Ireland. ECOSG+, ECOSG, and ESA WorldCover each have dark-coloured off-diagonal entries, indicating an over-estimation of grassland over Ireland, which seems to be stronger than for over the rest of Europe (Figure 5). Shrubs are barely found on any of the maps. They are misclassified as grassland in ESA WorldCover and ECOSG+, while in ECOSG "Shrubs" are distributed between "Shrubs" and "Flooded vegetation" (compare the LUCAS shrub label (y-axis) with what appears in the other datasets (x-axis)). Meanwhile, NLC 2018 has a smaller overestimation of grassland and only confuses grassland with urban area. This is possibly due to the conversion of the NLC 2018 labels " 510 Improved Grassland" and " 520 Amenity Grassland " but we note that this confusion is less pronounced than for the other datasets. NLC 2018 also indicates the presence of "Flooded vegetation" instead of "Shrubs", another possible consequence of the conversion to primary labels. NLC 2018’s overall accuracy of 0.571 is below the producer accuracy range, however once "Shrubs"and "Flooded vegetation" are not taken into account the overall accuracy is 0.797 with the producer accuracy range. This indicates NLC 2018 is a valid reference except for "Shrubs" and "Flooded vegetation" primary labels,

The F1-scores in Figure Table A9 show that NLC 2018 has the highest F1-score for "Water bodies", "Bare land", , "Shrubs" and "Urban" and for every other label its F1-score is less than 0.058 away from the label´s highest score. This consistency in the F1-score across all labels confirms that NLC 2018, converted to primary labels, is reasonable over the Republic of Ireland.

Figure A1. Normalised confusion matrices for ECOSG+, NLC 2018, ECOSG, and ESA WorldCover, against LUCAS over Ireland.

Table A9. F1-scores over Ireland for the ECOSG, ECOSG+, NLC 2018, and ESA WorldCover maps with LUCAS 2022 as a reference. The maps have been converted to primary land cover labels. For each primary label, bold font indicates the best result between ECOSG and ECOSG+. ESA WorldCover scores are only informative as we first want to compare NLC 2018, ECOSG and ECOSG+. Stars indicate each row’s highest value.

	ECOSG+	ECOSG	NLC 2018	ESA WorldCover
Water bodies	0.843	0.684	0.896*	0.892
Bare land	0.086	0.086	0.280*	0.053
Forest	0.537	0.164	0.605	0.614*
Shrubs	NaN	NaN	0.080*	NaN
Grassland	0.784	0.692	0.730	0.788*
Crops	0.644	0.411	0.690	0.737*
Flooded vegetation	0.269*	0.205	0.262	0.092
Urban	0.273	0.188	0.456*	0.295

References

Walsh, E.; Bessardon, G.; Gleeson, E.; Ulmas, P. Using machine learning to produce a very high resolution land-cover map for Ireland. Advances in Science and Research 2021, 18, 65–87. [Google Scholar] [CrossRef]
Bengtsson, L.; Andrae, U.; Aspelien, T.; Batrak, Y.; Calvo, J.; de Rooy, W.; Gleeson, E.; Hansen-Sass, B.; Homleid, M.; Hortal, M.; Ivarsson, K.I.; Lenderink, G.; Niemelä, S.; Nielsen, K.P.; Onvlee, J.; Rontu, L.; Samuelsson, P.; Muñoz, D.S.; Subias, A.; Tijm, S.; Toll, V.; Yang, X.; Køltzow, M. . The HARMONIE–AROME Model Configuration in the ALADIN–HIRLAM NWP System. Monthly Weather Review 2017, 145, 1919–1935. [Google Scholar] [CrossRef]
Masson, V.; Champeaux, J.L.; Chauvin, F.; Meriguet, C.; Lacaze, R. A global database of land surface parameters at 1-km resolution in meteorological and climate models. Journal of Climate 2003. [Google Scholar] [CrossRef]
Faroux, S.; Kaptué Tchuenté, A.T.; Roujean, J.L.; Masson, V.; Martin, E.; Le Moigne, P.; Le Moigne, P. ECOCLIMAP-II/Europe: a twofold database of ecosystems and surface parameters at 1 km resolution based on satellite information for use in land surface, meteorological and climate models. Geoscientific Model Development 2013. [Google Scholar] [CrossRef]
Druel, A.; Munier, S.; Mucia, A.; Albergel, C.; Calvet, J.C. Supplement of Implementation of a new crop phenology and irrigation scheme in the ISBA land surface model using SURFEX_v8.1. Geoscientific Model Development 2022, 15, 8453–8471. [Google Scholar] [CrossRef]
CNRM. Research Demonstration Project Paris 2024 Olympics.
Lemonsu, A.; Alessandrini, J.; Capo, J.; Claeys, M.; Cordeau, E.; de Munck, C.; Dahech, S.; Dupont, J.; Dugay, F.; Dupuis, V. ; others. The heat and health in cities (H2C) project to support the prevention of extreme heat in cities, 2024.
Hagelin, S.; Auger, L.; Brovelli, P.; Dupont, O. Nowcasting with the AROME model: First results from the high-resolution AROME airport. Weather and forecasting 2014, 29, 773–787. [Google Scholar] [CrossRef]
Malinowski, R.; Lewiński, S.; Rybicki, M.; Gromny, E.; Jenerowicz, M.; Krupiński, M.; Nowakowski, A.; Wojtkowski, C.; Krupiński, M.; Krätzschmar, E.; Schauer, P. Automated Production of a Land Cover/Use Map of Europe Based on Sentinel-2 Imagery. Remote Sensing 2020, 12, 1–25. [Google Scholar] [CrossRef]
Venter, Z.S.; Sydenham, M.A.K. Continental-Scale Land Cover Mapping at 10 m Resolution Over Europe (ELC10). Remote Sensing 2021, Vol. 13, Page 2301 2021, 13, 2301. [Google Scholar] [CrossRef]
Karra, K.; Kontgis, C.; Statman-Weil, Z.; Mazzariello, J.C.; Mathis, M.; Brumby, S.P. Global land use / land cover with Sentinel 2 and deep learning. 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, 2021, pp. 4704–4707. [CrossRef]
Lydon, K.; Smith, G. National Land Cover Map of Ireland 2018 Final Report. Technical report, Tailte Éireann in partnership with the Environmental Protection Agency (EPA) and with the support of members of the cross-governmental national landcover and habitat mapping (NLCHM) working group., 2023.
Mallet, C.; Le Bris, A. Current Challenges in Operational Very High Resolution Land-cover Mapping. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives 2020, 43, 703–710. [Google Scholar] [CrossRef]
Radoux, J.; Bourdouxhe, A.; Copp, T.; Vroey, M.D.; Dufr, M.; Defourny, P. A Consistent Land Cover Map Time Series at 2 m Spatial Resolution—The LifeWatch 2006-2015-2018-2019 Dataset for Wallonia. Data 2023, 8, 0–10. [Google Scholar] [CrossRef]
Keany, E.; Bessardon, G.; Gleeson, E. Using machine learning to produce a cost-effective national building height map of Ireland to categorise local climate zones. Advances in Science and Research 2022, 19, 13–27. [Google Scholar] [CrossRef]
Kimpson, T.; Choulga, M.; Chantry, M.; Balsamo, G.; Boussetta, S.; Dueben, P.; Palmer, T. Deep learning for quality control of surface physiographic fields using satellite Earth observations. Hydrology and Earth System Sciences 2023, 27, 4661–4685. [Google Scholar] [CrossRef]
Ballin, M.; Barcaroli, G.; Masselli, G. New LUCAS 2022 sample and subsamples design: Criticalities and solutions. Technical report, Publications Office of the European Union, 2022. [CrossRef]
EEA. CLC+Backbone 2018 (raster 10 m), Europe, 3-yearly, Feb. 2023, 2022. [CrossRef]
Ulmas, P.; Liiv, I. Segmentation of Satellite Imagery using U-Net Models for Land Cover Classification, 2020, [arXiv:cs.CV/2003.02899].
Camacho Olmedo, M.T.; García-Álvarez, D.; Gallardo, M.; Mas, J.F.; Paegelow, M.; Castillo-Santiago, M.Á.; Molinero-Parejo, R. , 2022; pp. 35–46. doi:10.1007/978-3-030-90998-7_3.Guideline. In Land Use Cover Datasets and Validation Tools: Validation Practices with QGIS; Springer International Publishing: Cham, 2022; pp. 35–46. [Google Scholar] [CrossRef]
Ottośson, J.G.; Sveinsdot́tir, A.; Harðardot́tir, M. Vistgerðir aÍślandi. Fjölrit Náttúrufræðistofnunar 2016, 54. [Google Scholar]
EEA. CORINE Land Cover 2018 (raster 100 m), Europe, 6-yearly - version 2020_20u1, May 2020, 2020. [CrossRef]
Zanaga, D.; Kerchove, R.V.D.; Daems, D.; Keersmaecker, W.D.; Brockmann, C.; Kirches, G.; Wevers, J.; Cartus, O.; Santoro, M.; Fritz, S.; Lesiv, M.; Herold, M.; Tsendbazar, N.E.; Xu, P.; Ramoino, F.; Arino, O. ESA WorldCover 10 m 2021 v200, 2022. [CrossRef]
Beck, H.E.; Zimmermann, N.E.; McVicar, T.R.; Vergopolan, N.; Berg, A.; Wood, E.F. Present and future Köppen-Geiger climate classification maps at 1-km resolution. Scientific Data 2018, 5, 180214. [Google Scholar] [CrossRef]
Rieutord, T.; Bessardon, G.; Gleeson, E. High-resolution land use land cover dataset for meteorological modelling – Part 2: ECOCLIMAP-SG-ML an ensemble land cover map. Earth System Science Data 2024. [Google Scholar]
Bessardon, G.; Rieutord, T.; Gleeson, E.; Oswald, S. ECOCLIMAP-SG+: an agreement-based high-resolution land use land cover dataset for meteorological modelling, 2024. [CrossRef]
Liu, C.; Xu, X.; Feng, X.; Cheng, X.; Liu, C.; Huang, H. CALC-2020: a new baseline land cover map at 10m resolution for the circumpolar Arctic, 2023. [CrossRef]
Buchhorn, M.; Smets, B.; Bertels, L.; Roo, B.D.; Lesiv, M.; Tsendbazar, N.E.; Herold, M.; Fritz, S. Copernicus Global Land Service: Land Cover 100m: collection 3: epoch 2019: Globe, 2020. [CrossRef]
Tricht, K.V.; Degerickx, J.; Gilliams, S.; Zanaga, D.; Battude, M.; Grosu, A.; Brombacher, J.; Lesiv, M.; Bayas, J.C.L.; Karanam, S.; Fritz, S.; Becker-Reshef, I.; Franch, B.; Mollà-Bononad, B.; Boogaard, H.; Pratihast, A.K.; Koetz, B.; Szantoi, Z. WorldCereal: a dynamic open-source system for global-scale, seasonal, and reproducible crop and irrigation mapping. Earth System Science Data 2023, 15, 5491–5515. [Google Scholar] [CrossRef]
Gong, P.; Liu, H.; Zhang, M.; Li, C.; Wang, J.; Huang, H.; Clinton, N.; Ji, L.; Li, W.; Bai, Y.; Chen, B.; Xu, B.; Zhu, Z.; Yuan, C.; Suen, H.P.; Guo, J.; Xu, N.; Li, W.; Zhao, Y.; Yang, J.; Yu, C.; Wang, X.; Fu, H.; Yu, L.; Dronova, I.; Hui, F.; Cheng, X.; Shi, X.; Xiao, F.; Liu, Q.; Song, L. Stable classification with limited sample: transferring a 30-m resolution sample set collected in 2015 to mapping 10-m resolution global land cover in 2017. Science Bulletin 2019, 64, 370–373. [Google Scholar] [CrossRef]
Pesaresi, M.; Politis, P. GHS-BUILT-C R2023A - GHS Settlement Characteristics, derived from Sentinel2 composite (2018) and other GHS R2023A data, 2023. [CrossRef]
Pesaresi, M.; Politis, P. GHS-BUILT-S R2023A - GHS built-up surface grid, derived from Sentinel2 composite and Landsat, multitemporal (1975-2030), 2023. [CrossRef]
Demuzere, M.; Kittner, J.; Martilli, A.; Mills, G.; Moede, C.; Stewart, I.D.; Vliet, J.V.; Bechtel, B. A global map of local climate zones to support earth system modelling and urban-scale environmental science. Earth System Science Data 2022, 14, 3835–3873. [Google Scholar] [CrossRef]
Liangyun, L.; Xiao, Z.; Xidong, C.; Yuan, G.; Jun, M. GLC_FCS30-2020: Global Land Cover with Fine Classification System at 30m in 2020, 2020. [CrossRef]
Allen, G.H.; Pavelsky, T.M. Global River Widths from Landsat (GRWL) Database, 2018. [CrossRef]
Zhang, X.; Liu, L.; Zhao, T.; Chen, X.; Lin, S.; Wang, J.; Mi, J.; Liu, W. GWL_FCS30: a global 30 m wetland map with a fine classification system using multi-sourced and time-series remote sensing imagery in 2020. Earth System Science Data 2023, 15, 265–293. [Google Scholar] [CrossRef]
Messager, M.L.; Lehner, B.; Grill, G.; Nedeva, I.; Schmitt, O. Estimating the volume and age of water stored in global lakes using a geo-statistical approach. Nature Communications 2016 7:1 2016, 7, 1–11. [Google Scholar] [CrossRef]
Yamazaki, D.; Ikeshima, D.; Sosa, J.; Bates, P.D.; Allen, G.H.; Pavelsky, T.M. MERIT Hydro: A High-Resolution Global Hydrography Map Based on Latest Topography Dataset. Water Resources Research 2019, 55, 5053–5073. [Google Scholar] [CrossRef]
EEA. Coastal Zones Land Cover/Land Use 2018 (vector), Europe, 6-yearly, Feb. 2021, 2021. [CrossRef]
D’Andrimont, R.; Verhegghen, A.; Lemoine, G.; Kempeneers, P.; Meroni, M.; van der Velde, M. From parcel to continental scale – A first European crop type map based on Sentinel-1 and LUCAS Copernicus in-situ observations. Remote Sensing of Environment 2021, 266, 112708–2105. [Google Scholar] [CrossRef]
Witjes, M. OSM Grass, 2022.
Parente, L.; Witjes, M.; Hengl, T.; Landa, M.; Brodsky, L. Continental Europe land cover mapping at 30m resolution based CORINE and LUCAS on samples, 2021. [CrossRef]
Marsoner, T.; Simion, H.; Giombini, V.; Vigl, L.E.; Candiago, S. A detailed land use/land cover map for the European Alps macro region. Scientific Data 2023 10:1 2023, 10, 1–11. [Google Scholar] [CrossRef] [PubMed]
EEA. EU-Hydro – Coastline - version 1.2, Sep. 2020, 2020. doi:copernicus_v_3035_50_k_hydro-cl_p_2006- 2012_v01_r02.
Bocher, E.; Bernard, J.; Wiederhold, E.; Leconte, F.; Petit, G.; Palominos, S.; Noûs, C. GeoClimate: a Geospatial processing toolbox for environmental and climate studies. Journal of Open Source Software 2021, 6, 3541. [Google Scholar] [CrossRef]
EEA. Grassland 2018 (raster 10 m), Europe, 3-yearly, Aug. 2020, 2020. [CrossRef]
EEA. Imperviousness Density 2018 (raster 10 m), Europe, 3-yearly, Aug. 2020, 2020. [CrossRef]
EEA. N2K 2018 (vector), Europe, 6-yearly, Jul. 2021, 2021. [CrossRef]
Eurogeographics. EuroRegionalMap, 2021.
EEA. Riparian Zones Land Cover/Land Use 2018 (vector), Europe, 6-yearly, Dec. 2021, 2021. [CrossRef]
Costa, H.; Benevides, P.; Moreira, F.D.; Moraes, D.; Caetano, M. Spatially Stratified and Multi-Stage Approach for National Land Cover Mapping Based on Sentinel-2 Data and Expert Knowledge. Remote Sensing 2022. [Google Scholar] [CrossRef]
Naturvårdsverket. Nationella marktäckedata 2018: basskikt, 2018.
Thierion, V.; Vincent, A.; Valero, S. Theia OSO Land Cover Map 2020, 2022. [CrossRef]
EPA. Water Framework Directive Canal Waterbodies, 2020.
EPA. Water Framework Directive Coastal Waterbodies, 2020.
EPA. Water Framework Directive Lake Waterbodies, 2020.
EPA. Water Framework Directive River Waterbodies, 2020.
EPA. Water Framework Directive Transitional Waterbodies, 2020.
CEC. North American Land Cover, 2020 (Landsat, 30m), 2023.
Wickham, J.; Stehman, S.V.; Sorenson, D.G.; Gass, L.; Dewitz, J.A. Thematic accuracy assessment of the NLCD 2019 land cover for the conterminous United States. GIScience & Remote Sensing 2023, 60. [Google Scholar] [CrossRef]
Wu, Q. geemap: A Python package for interactive mapping with Google Earth Engine. Journal of Open Source Software 2020, 5, 2305. [Google Scholar] [CrossRef]
Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment 2017, 202, 18–27. [Google Scholar] [CrossRef]
Roy, S.; Schwehr, K.; Pasquarella, V.; Swetnam, T. samapriya/awesome-gee-community-datasets: Community Catalog, 2023. [CrossRef]
Wu, Q.; Lane, C.R.; Li, X.; Zhao, K.; Zhou, Y.; Clinton, N.; DeVries, B.; Golden, H.E.; Lang, M.W. Integrating LiDAR data and multi-temporal aerial imagery to map wetland inundation dynamics using Google Earth Engine. Remote Sensing of Environment 2019, 228, 1–13. [Google Scholar] [CrossRef]

1	See CNRM wiki page on ECOCLIMAP-SG: https://opensource.umr-cnrm.fr/projects/ecoclimap-sg/wiki (last access September 12, 2024)
2	Except for one map of Portugal that does not include the snow primary label.
3	In the case of a joint maximum in $S_{r f}$ , we take the $l_{2}$ with the highest $S_{s p}$ . If the highest $S_{s p}$ is also a joint maximum, we take the $l_{2}$ with the lowest label number (see Appendix D.3 for an example).
4	Primary labels when the reference is LUCAS 2022 or NLC 2018, secondary labels when the reference is ECOSGIMO.

Figure 1. Illustration showing the backbone and specialist maps, primary labels and the ECOCLIMAP Second Generation secondary labels.

Figure 2. Overview of ECOSG+ (top) and its quality score (bottom) on the EURAT domain. Upsampled at 0.1º in EPSG:4326 with the nearest neighbour method.

Figure 3. Distribution of the land cover labels over the EURAT domain at 0.1 resolution for ECOSG+ (outer ring) and ECOSG (inner ring). Labels with less than 2% coverage have been removed from the annotations. 33.79% of the pixels have a quality score above the 0.525 threshold.

Figure 4. Qualitative verification of the ECOSG+ map on a given set of patches.

Figure 5. Row-wise normalised confusion matrices of ECOSG+, ECOSG+ downsampled at 300 m (ECOSG+300), ECOSG and ESA WorldCover over Europe, using LUCAS 2022 as a reference.

Figure 6. Row-wise normalised confusion matrices for ECOSG+, ECOSG, and ESA WorldCover, against NLC 2018 over Ireland

Figure 7. Row-wise normalised confusion matrices for ECOSG+, ECOSG, against ECOSGIMO over Iceland. For visibility, labels not in ECOSGIMO have been removed.

Table 1. F1-scores over Europe for the ECOSG, ECOSG+, ECOSG+300, and ESA WorldCover maps with LUCAS 2022 as a reference. The maps have been converted to primary land cover labels. For each primary label, bold font indicates the highest ECOSG or ECOSG+ F1-score. ESA WorldCover scores are purely informative, as we first want to compare ECOSG and ECOSG+. Stars indicate the highest value per label.

	ECOSG+	ECOSG+300	ECOSG	ESA WorldCover
Water bodies	0.786	0.709	0.600	0.831*
Bare land	0.154*	0.146	0.126	0.138
Snow	0.788*	0.783	0.708	0.643
Forest	0.709	0.648	0.545	0.749*
Shrubs	0.114	0.094	0.091	0.171*
Grassland	0.599	0.522	0.304	0.622*
Crops	0.614	0.570	0.471	0.705*
Flooded vegetation	0.438*	0.376	0.342	0.367
Urban	0.374	0.321	0.279	0.418*

Table 2. F1-scores over Ireland for the ECOSG+, ECOSG, and ESA WorldCover maps with NLC 2018 as a reference. The maps have been converted to primary land cover labels.

	ECOSG+	ECOSG	ESA WorldCover
Water bodies	0.964	0.932	0.966*
Bare land	0.109	0.151	0.043
Forest	0.725*	0.299	0.710
Shrubs	0.000	0.000	0.000
Grassland	0.746*	0.703	0.726
Crops	0.675*	0.400	0.643
Flooded vegetation	0.171	0.552*	0.025
Urban	0.465*	0.340	0.427

Table 3. Primary labels F1-scores over Iceland for ECOSG+, ECOSG, and ESA WorldCover, against ECOSGIMO over Iceland. The maps have been converted to primary land cover labels. Bold font indicates the highest ECOSG or ECOSG+ F1-score. ESA WorldCover scores are purely informative, as we first want to compare ECOSG and ECOSG+. Stars indicate the highest value per label.

	ECOSG+	ECOSG	ESA WorldCover
Water bodies	0.986*	0.976	0.978
Bare land	0.800*	0.749	0.706
Snow	0.966*	0.950	0.907
Forest	0.106	0.069	0.253*
Shrubs	0.047	0.055	0.228*
Grassland	0.738*	0.566	0.738*
Flooded vegetation	0.174	0.223*	0.028
Urban	0.411*	0.334	0.344

Table 4. Secondary labels F1-scores over Iceland for ECOSG+, ECOSG, against ECOSG IMO over Iceland. Bold font indicates the highest ECOSG or ECOSG+ F1-score. A third column with the difference between the two scores has been added to highlight significant F1-score differences (when one value is NaN it is counted as a 0 in the difference)

	ECOSG+	ECOSG	Gap (new−old)
1. Sea and oceans	0.992	0.989	0.003
2. Lakes	0.837	0.523	0.314
3. Rivers	0.632	0.201	0.431
4. Bare land	0.428	0.740	-0.312
5. Bare rocks	0.072	0.016	0.056
6. Permanent snow and ice	0.966	0.950	0.016
7. Boreal broadleaf deciduous	0.102	0.000	0.101
8. Temperate broadleaf deciduous	0.001	0.002	-0.001
12. Boreal needleleaf evergreen	0.013	0.012	0.001
13. Boreal needleleaf deciduous	NaN	NaN	NaN
15. Shrubs	0.047	0.056	-0.09
16. Boreal grassland	0.444	NaN	0.444
17. Temperate grassland	0.126	0.298	-0.172
23. Flooded grassland	0.174	0.030	0.144
25. LCZ2: compact mid rise	0.046	NaN	0.046
28. LCZ5: open midrise	NaN	NaN	NaN
29. LCZ6: open low-rise	0.569	0.355	0.214
31. LCZ8: large low-rise	0.400	NaN	0.400
32. LCZ9: sparsely built	0.033	NaN	0.033
33. LCZ10: heavy industry	0.329	NaN	0.329

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

High-Resolution Land Use Land Cover Dataset for Meteorological Modelling – Part 1: ECOCLIMAP-SG+ an Agreement-Based Dataset

Abstract

Keywords:

Subject:

1. Introduction

2. Materials and Methods

2.1. Material

2.1.1. Primary and Secondary Labels

2.1.2. Land Cover Maps

Backbone Maps

Specialist Maps

2.2. Methods

2.2.1. Construction of ECOSG+

Definition of a Specialist Agreement Score

Refinement of Backbone Maps

Best-Guess Map

Quality Assessment

Assembling

2.2.2. Evaluation of ECOSG+

Reference Maps

Comparison Scores

Baseline Maps

3. Results

3.1. Qualitative Evaluation

3.1.1. Overview of the ECOSG+ Map and Its Quality Score

3.1.2. Distribution of Labels

3.1.3. Zoom on a Few Patches

3.2. Quantitative Evaluations

3.2.1. Europe-Wide Evaluation against LUCAS

3.2.2. Small Scale Feature Evaluation against NLC 2018

3.2.3. Secondary Label Evaluation against ECOSGIMO

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Tables of Land Cover Datasets Used in the Creation of ECOSG+

Appendix B. Conversion Tables for Particular Cases

Appendix C. Conversion Tables Used for the Evaluation

Appendix D. Exceptions and Special Cases in the Construction of ECOSG+

Appendix D.1. Exceptions in the Land Cover Maps

Appendix D.2. Exceptions in the Specialist Agreement Score

Appendix D.3. Exceptions in the Refinement Process

Joint Maximum in Score

Bioclimatic Classification

The Forest Primary Label

Appendix E. Limitations of NLC 2018: Comparison against LUCAS

References

MDPI Initiatives

Important Links

Subscribe