1. Introduction
Remote sensing is an irreplaceable source of information on Earth’s systems [
1,
2]. A broad and constantly increasing portfolio of Earth observation missions enables near real-time monitoring capabilities [
3,
4], whereas rich data archives provide the baseline for long-term analyses [
5,
6]. Land cover and land use change monitoring is among research and application domains frequently employing remote sensing [
7,
8]. Multi-spectral medium-resolution satellite data are commonly used when studying land cover and land use processes because they provide a well-balanced trade-off between spectral sensitivity to changes, spatial resolution, and revisit frequency.
Among optical medium-resolution sensors (i.e., 10-30 m), the Landsat missions are the longest, continuously running Earth observation program. Comparability of acquisition schemes and technical characteristics of the consecutive Landsat missions, especially since the launch of Landsat 4 in 1982, make the Landsat data record a unique source of information for numerous land cover and land use applications [
1,
2,
9]. With a maximum revisit time of 8 days when two satellites are in operation, Landsat can capture subtle long-term and more dynamic land surface processes [
5,
6,
10] relevant for, but not limited to, forestry [
11,
12], agriculture [
13,
14,
15], fresh water [
16,
17,
18] and urban monitoring [
19,
20,
21].
The Sentinel-2 mission of the European Commission’s Copernicus program is another prominent source of medium-resolution Earth observation data for land cover and land use monitoring [
22]. With two satellites in operation phase since December 2015 (Sentinel-2A) and June 2017 (Sentinel-2B), the mission provides multi-spectral observations at 10-20-m resolution [
23] under a ‘free, full and open data policy’ [
24]. Thanks to the constellation design and a wide observation swath, the revisit time is reduced to 5 days at the equator, and to 2-3 days at mid-latitudes. Despite relatively short operation history, Sentinel-2 has already been extensively used in analyses including agricultural [
25,
26,
27], forest [
28,
29], aquatic [
30,
31] and urban ecosystems [
32,
33]. Higher revisit time of Sentinel-2, as compared with Landsat, facilitates using Sentinel-2 data for detection of key management and phenological events [
27,
34,
35].
The technological advancements in data preprocessing, processing and storage, combined with free and open data distribution have facilitated a shift from mono- and bi-temporal analyses to studies based on dense time series of satellite observations [
36,
37,
38]. Moreover, because spectral and spatial consistency is a fundamental prerequisite for time series analyses, past studies have been typically confined to acquisitions from a single sensor system. However, increasing computing capabilities have eased this limitation, allowing for harmonization of vast volumes of data. Consequently, it has become feasible to integrate acquisitions across sensor families with different spatial and spectral resolutions, in order to fill data gaps [
39,
40] and to improve temporal resolution [
41,
42,
43]. Synergetic use of Landsat and Sentinel-2 data has become especially recognized and resulted in various harmonization workflows and datasets [
44] such as the Harmonized Landsat Sentinel-2 (HLS) data [
45], the harmonization workflow available in the Framework for Operational Radiometric Correction for Environmental monitoring (FORCE; Frantz, 2019), and Sen2Like toolbox [
47].
Despite all technological advances, valid observations are always less frequent than the nominal data acquisition frequency due to clouds, cloud shadows, snow, and other disturbances [
48,
49,
50]. Depending on the geographical location of the study area and targeted time period, data availability varies vastly owing to, among others, long-term and seasonal variability of meteorological conditions [
51,
52], as well as historic data acquisition and archiving schemes [
53]. While analyses based on, for example, spectral-temporal metrics [
54,
55,
56] and per-pixel function fitting [
38,
57] are somewhat insensitive to irregular temporal resolution of a time series, analysis approaches such as Fourier and wavelet transforms, trend analyses, and some machine learning implementations require temporally equidistant data distribution (Griffiths et al., 2019; Testa et al., 2018; Udelhoven, 2011).
Pixel-based temporal compositing, also broadly referred to as temporal binning [
58], generates temporally equidistant cloud-free or near-cloud free synthetic images by deriving the ‘best’ per-pixel value within a desired time window. The approach has a long heritage in satellite remote sensing and has been used to derive cloud-free datasets such as the AVHRR GIMMS [
59,
60], MODIS multi-day data products [
61,
62], Landsat WELD [
63], WorldCover Sentinel-2 annual mosaics [
64], and Planet base maps. Temporal composites typically require a minimum of one valid observation per pixel within a time window, although more data generally improve the composites’ quality [
65,
66]. While multiple algorithms exist to select or calculate the ‘best’ pixel value for each time step [
65,
67,
68,
69], a length of the time window tends to be selected arbitrary based on the operator’s intuition and experience, to match the temporal granularity of a studied process (
Table 1). The data availability in time and space over the studied area is rarely explored thoroughly a-priori, making the selection of an appropriate time window a difficult decision, especially for less experienced users. Yet, spatio-temporal inconsistency in data availability combined with the use of a sub-optimal compositing window width may lead to data gaps and lower quality of derived composites and time series, which in turn may impact credibility of subsequent analyses. Composite-based long-term studies of land-cover and land use dynamics are particularly vulnerable to changes in data availability over time, especially when aimed at dynamic ecosystems [
6,
70]. Concomitantly, spatial disparity of data availability is expected over vast regions [
50,
71].
In this study we analyzed how data availability of Landsat and Sentinel-2 over Europe varies in space and time and how these variations translate into feasible lengths of compositing windows, thereupon to support the informed selection of compositing windows for land use and land cover analyses. We calculated per-pixel availability of usable data at 20-km point-grid throughout the complete 1984-2021 Landsat and 2015-2021 Sentinel-2 data records considering their separate and conjoint use, and focused on compositing windows of: five, 10, 15, 20, and 25 days; one, two, three, four, six and 12 months. Specifically, our research objective (RO) were to analyze: i) how composites’ availability changed in Europe altogether and within biogeographical regions across the temporal depth of the archives for both the full calendar year and the growing seasons; ii) spatio-temporal data availability of annual, monthly and 10-day composites commonly used in forest disturbances, land cover, and agricultural monitoring studies, respectively, to determine where and when research could be facilitated or hampered by data availability; iii) the shortest feasible compositing windows over Europe on a yearly basis and across a variety of medium- and long-term time windows, assuming a minimum of 50% of data each year and interpolation of the missing composites; iiii) spatio-temporal capability to capture observations acquired on a specific day, here 15th June or 15th July, that correspond with key phenological states and management events. Altogether, our results aim to support informed decisions on the selection of compositing windows for land cover and land use change analyses.
2. Materials and methods
2.1. Study area
Our study area covers Europe spanning between 25°W and 45°E, and 71°N and 35°N. The area comprises 32 EEA member states (as of September 2022), six Balkan states, the United Kingdom, Ukraine, Belarus and Kaliningrad Oblast. Owing to great latitudinal and longitudinal extend the study area encompasses 17 climate zones [
101] and 10 biogeographical regions (EEA, 2016;
Figure 1). Environmental conditions follow a latitudinal gradient with arctic and cold climate in the northmost regions and hot Mediterranean climate in the south, and longitudinal gradient with humid Atlantic climate in the west that changes to continental and dry climate to the east. Mountain ranges are characterized by alpine conditions.
Variability of environmental conditions in Europe is reflected in land cover and land use. Forests dominate land cover (>40% or the area) with boreal coniferous forests in the north and in the high mountains, and more fragmented mixed and broadleaved stands at lower altitudes and to the south. Approximately 40% of the land in Europe is used for agriculture, which comprises perennial crops, permanent crops and grasslands. Management intensity and landscape complexity vary greatly across the area with highly-managed systems common in Western Europe and Turkey [
103,
104,
105].
Compared to other regions, Europe is characterized by relatively high Landsat and Sentinel-2 data availability [
49,
50,
53]. Furthermore, a discrepancy between higher Sentinel-2 observation frequencies over Europe and other geographical regions is particularly clear for the ramp-up phase of the mission before 2018. High Landsat data availability over Europe in the 1980s and 1990s arises from the extended network of historical ground receiving stations and ESA’s archiving efforts that preserved over 1.2 million unique TM and ETM+ scenes [
53,
106], though only half of these scenes met Level 1 Tier-1 requirements [
107].
2.2. Landsat and Sentinel-2 time series and their preprocessing
We used all Landsat surface reflectance Level 2, Tier 1 (Collection 2) scenes from 1984 through 2021 and Sentinel-2 TOA reflectance Level-1C (pre-Collection-1; European Space Agency, 2021) scenes from 2016 through 2021 acquired over Europe, as available in Google Earth Engine (data accessed in June 2022; Gorelick et al., 2017). We utilized Seninel-2 Level-1C data instead of Level-2A because the Level-2A inherent quality data lack the desired scope and accuracy [
110,
111]. Yet, the Level-1C products are accompanied by cloud probabilities [
112] facilitating improved cloud screening. Furthermore, for cloud screening we also used Band 10 (Cirrus), which is not available as Level-2A. Since we performed data availability analyses, the disparity between Landsat and Sentinel-2 reflectance values was negligible. The difference in processing levels, however, played out in cloud, shadow and snow masking accuracy, where the Sentinel-2 workflow assemble several approaches with known accuracies [
113], but has not been evaluated as the whole. We, however, acknowledge that for real-life applications, data from corresponding processing levels should be used. We recommend thus either preprocessing Sentinel-2 TOA data to achieve desired quality of masks, or linking Sentinel-2 Level-A2 data with Level-1C band 10 and relevant Cloud Probability scenes for more rigorous cloud screening.
To ensure only pixels with the highest quality entered the analysis we applied conservative pixel-quality screening. For Landsat scenes we excluded all pixels flagged as cloud, shadow or snow using the inherent pixel quality bands [
114,
115] and discarded saturated pixels(Zhang et al., 2022). Owing to Landsat 7′s orbit drift [
116], we excluded all ETM+ scenes acquired after 31st December 2020. We applied the following screening steps to Sentinel-2 Level-1C data: we excluded all pixels flagged as clouds and cirrus in the inherent ‘QA60′ cloud mask band; we excluded all pixels with cloud probability higher than 50% as defined in the corresponding Cloud Probability product; we excluded cirrus clouds (B10 reflectance >0.01); we run Cloud Displacement Analyses (CDI<-0.5; Frantz et al., 2018); and finally we excluded dark pixels (B8 reflectance <0.16) by using scene-specific sun parameters within modeled shadows casted by the clouds identified in the previous steps, assuming a cloud height of 2,000 m. We further masked a buffer of 40 m (two pixels at 20-m resolution) around each identified cloud and shadow object. Finally, we excluded snow pixels by applying thresholds from snow mask branch of the Sen2Cor processor [
118].
2.3. Auxiliary data
To identify the pixel-specific growing season we used 2001-2019 time series of the yearly 500-m MODIS land cover dynamics product (MCD12Q2; Collection 6) available in Google Earth Engine (data accessed June 2022). We used only ‘best’ quality per-pixel dates of start and end of the growing season. For pixels with two growing cycles per year, we defined the growing season as a period between the onset of the first growing cycle and the end of the second growing cycle. Because both the start of the season and the end of the season showed considerable variability during the studied period, but without overall significant trends (P=0.227 and P=0.983 for start and end of season, respectively, with accounting for temporal autocorrelation following Ives et al. (2021)), we derived the pixel-based key phenological dates as 25th and 75th percentiles of the 2001-2019 distribution for the start of season and end of season, respectively (
Figures S1 and S2). This approach provided a good approximation allowing us to account for intra-annual variability and to extend the use of derived dates to the entire study period of 1984-2021. Phenological information in the northern- and southern-most parts of Europe was restricted owing to low quality of data and insufficiently pronounced seasonality [
120].
To stratify our results and analyze regional differences in data availability we used Biogeographical Regions (EEA, 2016;
Figure 1). Finally, we derived land cover information based on 2001-2019 MODIS MCD12Q1 annual products [
121,
122,
123] using the most frequent class for each pixel.
2.4. Data availability analysis
We used a 20-km grid of 16,642 equidistant points to analyze the availability of useable Landsat and Sentinel-2 observations over Europe. We distributed points according to the Lambert azimuthal equal-area projection (LAEA, EPSG:3035), which is the preferred projection for EU-wide products. Despite LAEA being the equal-area projection, the distance distortion within our study area was mostly below 10 m, which is less than one pixel in high-resolution Sentinel-2 bands. The systematic gridded sampling design ensured good representation of the West-East and South-North climatic and phenological gradients, and facilitated graphical presentation of results.
We intersected the point grid with each cloud-, shadow, and snow-masked Landsat and Sentinel-2 scene retrieving the date of the valid acquisition. In the process, we kept all datasets in their original resolution and assumed each point to be a probabilistic sample of the surrounding 20x20-km area. We excluded duplicated data entries coming from the vertical overlaps among Landsat tiles in the same row, and vertical and horizontal overlaps among Sentinel-2 granules from the same swath. This resulted in daily data availability for 1984-2021 (1 – valid observation; 0 – no data or unsuitable), which we used to derive data availability information for composites with aggregation periods of five, 10, 15, 20, and 25 days; one, two, three, four, six and 12 months. The non-overlapping compositing windows compartmentalized daily information for each year into 73, 37, 24, 18, 15, 12, six, four, three, two, and one composites for each calendar year, respectively. We used January 1st as the starting date for the compositing window sequence for each year. When the last compositing window was shorter than half its window width, we merged it with the penultimate composite. For each data point and every considered aggregation period we recorded the amount of available observations and considered the composite as ‘successful’ if at least one valid observation was available.
We summarized data availability for each grid point and each compositing window as the percentage of ‘successful’ composites available per year (hereafter ‘data availability’) (RO i). To reflect upon regional diversity of geographical and meteorological conditions we compiled annual mean of composites’ availability (± one standard deviation) within each biogeographical region. We considered Landsat and Sentinel-2 data records jointly and individually, and ran the analyses for the complete calendar year and pixel-specific growing season period as derived from MODIS land cover dynamics products. We included a composite into the growing season period if their timings overlapped at least one day, and reported the rate of ‘successful’ composites during the growing season only. For pixels with the growing season extending across two calendar years (e.g., in the Mediterranean region where the green vegetation peak is typically observed in December), we assigned respective statistics to the year comprising the major part of the growing season in question. Owing to the Sentinel-2 data record starting in mid-2015, we calculated Sentinel-2-only results starting from 2016.
To illustrate how data availability can affect specific LCLUC applications (RO ii), we analyzed the availability of growing season annual and monthly composites, and calendar year 10-day composites, which are commonly used in studies on forest, land cover, and agricultural monitoring, respectively (
Table 1). We evaluated data availability for different land cover classes reporting the overall number of years in the time series with 100% and >50% of the composites derived successfully. To better understand spatio-temporal differences in composites availability, we summarized data availability along the latitudinal and longitudinal gradients. Furthermore, to obtain a detailed insight into the availability of single composites we used selected points across different biogeographical regions across Europe. In Germany and Spain, we intentionally placed two points not further than 35 km away of each other to compare data availability within and outside vertical overlaps among Landsat orbits.
We analyzed the length of the shortest feasible per-pixel compositing window ensuring a minimum of 50% of data availability each year, assuming temporal interpolation of the missing composites (RO iii). We selected the 50% threshold (allowing for temporal clustering of missing composites) as a liberal maximum for a hypothetical data interpolation yielding a complete data record, apprehending that the chosen data reconstruction approach, desired accuracy, and specific application may well lead to different interpolation strategies [
124,
125]. We extended the analyses to a variety of medium- and long-term time periods comprising calendar year and growing season observations, and used two scenarios requiring a minimum 50% of data i) across all the years, and ii) for each year in the time series.
Finally, we examined how data availability can support detection of management practices and key phenological phases (RO iiii). We derived relevant spatial patterns of temporal granularity through identifying a successful composite with the shortest possible compositing window comprising the target dates 15th June and 15th July. We selected both dates owing to their rough coincidence with timing of key phenological phases and management practices in European ecosystems and being relevant for agricultural management practices [
35,
126].
4. Discussion
Compositing is among the data processing approaches facilitating land cover and land use related analyses through providing time series of cloud-free synthetic, temporally equidistant images. Although a lot of consideration is given to the methods identifying the ‘best’ pixel value for each time step, identification of the aggregation period recognizing temporal granularity of the process of interest and data availability over time and space receives less attention. Here, we analyzed lengths of data compositing periods feasible for Landsat and Sentinel-2 time series across Europe, considering minimum data availability requirements, and showcased the implications of data availability on selected applications.
We found that the Landsat data record ensured >50% overall spatial data coverage with monthly composites each year prior to 2011, and ~75% data coverage after 2014. Combining Landsat and Sentinel-2 archives from 2015 onward resulted in ~82% spatial data coverage for monthly composites each year, and between 62% and 75% coverage for 10-day composites. Temporal data coverage across a selection of medium- and long-term time periods was, however, limited in the northern latitudes, which necessitated compositing windows of minimum one month even when liberal ≥50% data availability threshold hence extensive data interpolation were imposed. As expected, data availability varied among biogeographical regions with the highest overall data coverage in Mediterranean and the lowest in Arctic, Alpine, and Boreal regions, hampering the feasibility of using short compositing windows for some applications in those regions. Limiting the annual data window to the growing season typically improved data coverage for composites with temporal granularity of less than one month, and reduced latitudinal diversity of feasible compositing windows, though consistency of data availability across medium- and long-term time periods remained mostly insufficient.
4.1. Data availability over Europe
Predictably, overall data coverage over Europe followed the timeline of the different phases of the Landsat and Sentinel-2 missions (
Figure 6 bottom). The difference in overall data coverage provided by a single Landsat 5 satellite in mid- to late-1990s was only a few percentage points lower compared to times of synergetic operation of Landsat 4 and 5 (1984-1994) and Landsat 5 and 7 (1999-2013) (
Figure 2 vs.
Figure 6), which is due to low acquisition load of Landsat 4, reduced Landsat 5 acquisitions during 1999-2003, and limited capacity of Landsat 7 after 2003. Notably, the Landsat 7 SLC failure in 2003 (Andréfouët et al., 2003) and technical issues of Landsat 5 since November 2011 that led to its decommissioning, had a clear negative impact on data availability, which is visible in spatial and temporal patterns. The first two years of our Landsat record were also characterized by low data coverage due to low acquisition load and potentially also preparations for a commercial phase of the mission operations in 1985 [
37]. We observed relatively high data coverage during the commercialization era of the Landsat Program (1985-2001), which stems from ESA’s archiving efforts and data consolidated [
53,
106], though notably only half of the preserved scenes were successfully processed to Level 1 [
107]. Overall, this distinguishes Europe from other geographical regions that typically have much lower Landsat data availability prior to 2010 [
50], but is far from data density available for this time period for the USA, Australia, and East China.
Since the launch of Landsat 8 in 2013, the overall data availability in Europe has greatly improved. The commence of Landsat 8 mission boosted Landsat-based data coverage providing ~75% annual spatial data coverage with monthly composites. The 2021 decline in data coverage arises from the exclusion of Landsat 7 data acquired after 31st December 2020 [
116], but with Landsat 9 being operational since 2022 [
2] the data availability will further increase. Furthermore, the Sentinel-2 constellation provided a remarkable source of data since 2016. With only 5-day revisit time, Sentinel-2 ensured much higher data frequency, thus typically supported shorter aggregation periods and higher temporal granularity comparing with Landsat. Combining Landsat and Sentinel-2 observations further improved spatial and temporal data coverage for most compositing windows and geographies, allowing much higher temporal granularity and time series consistency, and thus a better monitoring of land cover and land use.
Geographical location was the second factor driving differences in data coverage across Europe. As expected, regions in higher latitudes, and thus characterized by longer winters, and areas under maritime climate prone to higher cloud cover probability [
51], exhibited lower and more irregular data availability, thus required longer aggregation periods even when assuming interpolation of <50% of data. Similarly, mountain ranges also showed lower data coverage arising from cloud formation [
128] and snow cover (Notarnicola, 2020). When restricting our analyses to the per-pixel growing season, data coverage improved in the Alpine, Arctic and Boreal regions. On the contrary, in the Mediterranean region growing season analyses yielded lower data availability than analyses for the calendar year, which is due to growing season being governed by precipitation in the hot climate with dry summers. Finally, focus on the growing season lead to no improvements in the Atlantic biogeographical region, which is characterized by high cloud probability throughout the year. Notably, compositing windows comprising between three and 12 months sometimes yielded lower than expected temporal data availability for the growing season. This steams from our inclusive rules on including a composite as a growing-season-specific, which in some regions introduced composites with very low data availability. Altogether, we found the use of growing season data favorable in some geographies and conditions, while we caution for its general use, recognizing further that many approaches require or benefit from all-year acquisitions [
28,
130].
Finally, orbit overlaps systematically impact data availability At least two times higher average number of acquisitions in the areas of orbit overlaps as compared to the non-overlapping areas also translate into, on average, shorter compositing windows. This inconsistency is further convoluted for the high latitudes where the overlaps are increasingly greater, ultimately leading to multiple overlaps between more than the two neighboring orbits in the northernmost regions of Europe, as well as when more than one data source is used (
Figure S21). Although the greatest difference in number of acquisitions occurs along the latitudinal gradient, the absolute data availability in the northern regions is depleted due to meteorological conditions. Nevertheless, data availability infers heterogeneous quality of per-pixel compositing and different temporal granularity across regions and times, with sometimes startling differences within very close range. Acknowledging this heterogeneity is crucial when selecting the length of a compositing period, and when deriving and discussing results’ quality [
126,
131].
4.2. Impact of the processing workflow on the results
Data preprocessing and processing play a significant role in shaping availability and quality of data in the time series. Here we used all Landsat surface reflectance and Sentinel-2 TOA data available in Google Earth Engine (as of June 2022) and relied on cloud, cloud shadow, and snow masking functionality available in this environment. Including all scenes, regardless their scene-specific cloud cover metadata property, in theory, allowed us to incorporate all usable observations. This presents an advantage over using only acquisitions with scene-specific cloud cover below a certain user-defined threshold (e.g., <75% in Hemmerling et al. (2021) and <70% in Nill et al. (2022) and Frantz et al. (2022)). However, inclusion of the most cloudy scenes may increase the probability of introducing lower quality observations into the time series due to higher geolocation errors and omission errors in cloud and cloud shadow masking, as well as adjacency effects [
113,
114]. In our analyses we relied on Fmask version 3.3.1 for cloud, cloud shadow, and snow masking for Landsat [
114,
115], and expanded cloud and shadow identification for Sentinel-2 beyond the Cloud Probability product. The latter combined with Sen2Cor-based snow detection and the Sentinel-2 B10 cirrus-related threshold likely increased the commission error of cloud masking by excluding bright surfaces, such as rock and sand surfaces [
110], but also boosted detection of cirrus clouds and thus reduced low-quality observations in the time series. Moreover, simplified assumptions about uniform clouds’ height may translate into higher omission of cloud shadows. We acknowledge that performing the same analysis using alternative processing workflows will lead to different results due to the application of different cloud, cloud shadows, and snow masking approaches with often largely different properties in cloud detection user’s and producer’s accuracies [
113] that will affect the number of usable observation. Last but not least, all currently available cloud detection algorithms come with some level of omission error (see Skakun et al., 2022 – note that cloud shadow detection does not reach the same accuracy as cloud detection, see e.g., Zhu and Woodcock, 2012), which means that the ‘successfully’ generated composite may not always be factually cloud-free. This is principally important to acknowledge when data are processed using conceptually simpler cloud, shadow and snow detection approaches. Hence, our analyses are rather conservative with regard to their practical use. Overall, we are hence convinced that our results provide a valid generic overview of availability of usable Landsat and Sentinel-2 data over Europe since 1984.
Finally, our relatively sparse 20-km grid captured environmental gradients and orbit overlaps across Europe well. Although some local variability in data availability arising from cloud formation or snow cover duration may be omitted in our results, the broad data availability across Europe is appropriately represented adequately.
4.3. Implications for time-series analyses
Variability in Landsat and Sentinel-2 availability in time and space translated directly into the feasibility of creating continuous time series of spatially homogeneous composites for different aggregation windows over desired time periods. Despite relative data abundance in the most recent years, especially after 2015 for combined Landsat and Sentinel-2 archives, availability of usable data was often sparse for medium- and long-term applications and analyses. Data scarcity in the first 20 years of the Landsat record combined with occasional low data availability thereafter throttled spatial and temporal consistency of composite-based time series even for annual composites, affecting feasibility of some long-term LCLUC applications or leading to potential inaccuracies in, for example, labeling the time of land cover change.
Big improvement in spatial and temporal consistency of data was secured assuming interpolation of up to 50% of the composites each year (
Figures S13-S17), or across a complete time series of interest (
Figure 8 and
Figure 9). Data enhancement granted long-term analyses based on sub-annual composites, though complete 1984-2021 data coverage for monthly composites, and even more so for shorter compositing windows was still problematic. While reliability of interpolation depends on availability and distribution of the data [
124,
125] and used interpolation algorithms [
133,
134], any data interpolation provides only a simplified representation of the actual temporal variability. Consequently, interpolated data give only a generalized approximation of intra- and inter-annual variability, which is appropriate for some analyses and application areas (e.g., most of the forest disturbance monitoring and overall vegetation productivity studies) but may be problematic for the others (e.g., management intensity and mowing detection analyses).
Importantly, in our analyses we assumed that a single observation within each aggregation period will result in a successful composite. However, the quality of compositing depends on data availability for each compositing window [
65,
66] with some compositing algorithms requiring more than one observation for each aggregation period. Consequently, more conservative compositing approaches may require longer aggregation periods (
Figures S22–S27), with depleted temporal and spatial consistency of the resulting time series.
Overall, our results highlighted challenges in time series analyses based on Landsat and Sentinel-2 arising from highly variable data availability over Europe. Data availability may present a major limitation depending on the application, time frame of the analysis, compositing approach, additional postprocessing, and data interpolation. Time series of composites based on shorter aggregation windows and spanning longer time periods, especially before 2015, are most likely to be affected and comprise data gaps, which, depending on the application, may lend lower credibility of the subsequent analyses. As such, our analysis offers a readily available first assessment of 1984-2021 data availability for local to continental LCLUC analyses, informing thus on potential challenges and uncertainties, therefore allowing for a better understanding of downstream error budgets.
5. Conclusions
Here we demonstrated that the availability of usable 1984-2021 Landsat and Sentinel-2 observations over Europe varies greatly over time and space. This heterogeneity may result in inconsistent composited time series, which directly translates into feasibility and quality of some land cover and land use analyses and applications. This is particularly critical when expanding the temporal domain across different sensors and larger regions, where analyses at all spatial level should take these heterogeneities into account.
Our results indicated that achieving consistent availability of monthly 1984-2021 data over Europe requires extensive data interpolation, which may not be appropriate for all land cover and land use-related analyses and applications. Feasibility of consistent time series based on composites with shorter than monthly aggregation periods is mostly limited to the combined Landsat and Sentinel-2 archives after 2015, yet still may require extensive data interpolation in some geographies. In the Mediterranean biogeographical region, where overall data availability is the best, sub-monthly composites are most achievable. Among others, lower data availability may limit capability for rigorous Landsat-only-based, multi-decadal agricultural monitoring in some regions. However, the current Landsat and Sentinel-2 setups, and the planned Landsat Next and Sentinel Next Generation secure archives of dense data acquisitions that satisfy most requirements of most of the LCLUC applications for the upcoming years.
Overall, our results provide a detailed assessment of Landsat and Sentinel-2 data availability for compositing over Europe, suggesting concrete compositing windows feasible under selected analysis assumptions, time periods, and for exemplified application. Because these results represent only a fraction of possible analysis setups, we acknowledge that, while providing a general guideline for selecting suitable aggregation windows for composite-based studies, our results cannot accommodate all application-specific needs. To support informed decisions on application- and region-specific analyses, we refer users to the interactive, GEE-based interface (see “Data availability”) to use their respective settings for simulating compositing opportunities.