1. Introduction
Natural hazards such as earthquakes, floods and tornadoes threaten millions of people all over the world [
1]. The effects of these hazards on society and infrastructure depend on the vulnerability towards the hazards [
2]. These vulnerabilities are highly dynamic as some are decreasing due to new building codes, preparedness actions and resilient planning, while others are increasing due to rapid urbanization, increased industrialization, aging infrastructure and stronger interdependencies in modern societies [
3,
4]. Regardless of the different hazards or even combinations of them, it is key for emergency planning, resilience building and first response to catastrophes to understand the risks a society is exposed to. Because risk is the combination of hazard, exposure and vulnerability, all three aspects of the risk chain need to be well understood for any measure to be taken to reduce it [
5].
In this paper, we focus on the exposure part of the risk chain. Exposure models, describing the spatial distribution of assets (usually buildings and people) and the relative distribution of different building types show different levels of resolution and precision [
6]. In well-regulated countries, such models may describe the location of each building. In high-resolution studies, each building may be individually described in all relevant parameters. However, in many areas of the world, exposure models are rather coarse and aggregated over large areas, sometimes even over entire countries. This results in them being only useful if the damage or losses are estimated at this aggregation level too. To address local planning or local emergency response, exposure models with high resolution down to the building scale are desired.
To create exposure models on the building scale, the location and additional parameters of the buildings such as the building footprint, building height, and building material need to be known. This information is usually provided by cadastral data. However, such data is not available everywhere, either because its use is restricted, expensive or it does not even exist [
7].
The free and open geographic data community project OpenStreetMap (OSM) (
https://www.openstreetmap.org/) is potentially able to fill this gap. Although OSM data have been used extensively in disaster mapping and management [
8], their completeness is heterogeneous, with some areas very well mapped, whilst other areas lacking basic features [
9,
10,
11]. For example, the completeness of highly populated urban areas is often higher than that of remote and rural areas [
10,
12]. There are also differences between developed and developing countries [
11,
13]. These disparities depend on social factors, such as population distribution and population density, as well as the location of contributing users [
14].
Therefore, the assessment of the spatially heterogeneous data quality in OSM is of great importance. Current approaches can be distinguished mainly in extrinsic and intrinsic approaches. Extrinsic approaches use reference datasets as a benchmark to compare OSM against using indicators such as the length of the road network or the number of buildings or the positional accuracy of features such as buildings [
10,
15,
16]. These approaches face the challenge that reference data of sufficient quality is missing especially for large parts of the global South. Even if reference data is available it might be less current than OSM and cover only a subset of the relevant features [
17]. To overcome these issues, intrinsic approaches have been developed which address different aspects of data quality only based on the historical development of OSM [
18,
19,
20,
21]. Completeness of map features is thereby for example addressed by fitting saturation curves to the OSM contribution time series to assess the difference between fitted asymptote and current number of objects (e.g., [
12]) or by deriving community activity stages [
22,
23]. The completeness assessment based on saturation curves can only be used for areas with a reasonably high number of OSM features. Other approaches have tried to estimate the expected number of objects based on covariates – such as building density or geometric indicators at street block level to estimate building completeness [
24,
25], socioeconomic indicators, population density or and urban-rural gradients [
12,
26,
27]. Given large regional differences in both real-world features (such as building density) and mapping activity, the latter approaches are limited with respect to their transferability between regions, especially across urban-rural gradients or cultural boundaries.
The Humanitarian OpenStreetMap Team (HOT (
https://www.hotosm.org/)) and other humanitarian organizations have been addressing this issue since 2010 by activating volunteers to map buildings and roads. HOT stimulated volunteers through mostly catastrophe-related activities in collaboration with first responders in need of good maps with building locations. This imminent benefit of the volunteers’ work for first responders has certainly drawn a lot of attention to humanitarian mapping activities in OSM. However, HOT and other organizations have not limited their activities to ongoing or imminent catastrophes but expanded them to mapping larger areas, a.k.a. putting them on the map [
11,
14].
While a lot of resources and tools are in place to ease mapping in OSM, still some learning effort is needed for newcomers wanting to contribute. To ease that initial hurdle, the smartphone application
MapSwipe (
https://mapswipe.org/) has been developed and is maintained by the Heidelberg Institute for Geoinformation Technology (HeiGIT) in cooperation with the British Red Cross (BRC), the Humanitarian OpenStreetMap Team (HOT), Médecins Sans Frontières (MSF) and volunteers.
MapSwipe introduced the aspect of gamification to the detection of buildings by showing the user satellite imagery prompting for the selection of areas in which buildings can be identified by the user [
28]. Once these areas are marked, the user swipes the satellite imagery aside to receive the next images, hence the name
MapSwipe. As a result of these activities, areas containing no buildings are identified, making it much easier for other volunteers to draw their footprints from the satellite imagery as they do not have to intensely search for them over the total area of interest. This pre-selection of areas for mapping activities has proven useful as nearly 50,000
MapSwipe users have mapped more than 1,750,000 km
2 and finished about 500 projects. The data are publicly available for further use (
https://mapswipe.org/en/data.html). The HOT activities, together with
MapSwipe as well as regular OSM volunteer mapping, have made data in OSM become a ubiquitous part of disaster planning, emergency management and first response [
29].
MapSwipe conceptually extents desktop-based approaches such as Tomnod to the smartphone, thereby further lowering the bar for volunteers by enabling them to contribute easily during idle periods such as while riding a subway or waiting for a bus. Tomnod–a former project of the satellite company DigitalGlobe–was known for its campaigns such as searching for the missing Malaysian Airlines flight MH370 which attracted over eight million participants [
30] before being discontinued in August 2019.
Because risk assessment models that use OSM data also have to address the spatially varying completeness, it is important to identify areas with complete OSM building footprints for which detailed exposure models can be provided. Furthermore, emergency groups can plan additional mapping efforts in unmapped areas that are particularly affected by natural catastrophes. In contrast to the previous MapSwipe project types that are used to provide information about the presence or absence of buildings on satellite imagery, we introduced a completeness project type that classifies areas with regard to the completeness of OSM building footprints. This is intended to support steering mapping activities of volunteers–for example in the HOT tasking manger–to areas where information is missing for activities such as disaster response or forecast based financing.
This study aims at investigating the robustness of the completeness data produced by this crowd-sourced approach and aims to examine the following specific research questions:
What factors influence the performance of the OSM building completeness classification?
How well can the completeness feature produce reliable results so that it can be used in applications of risk-assessment solutions, such as exposure modeling?
The new completeness feature in the
MapSwipe application is part of a larger project. The Heidelberg University, the German Research Centre for Geosciences (GFZ) in Potsdam, the Karlsruhe Institute of Technology (KIT), Research Center for Information Technology (FZI) in Karlsruhe and the company Aeromey GmbH have teamed up in the project LOKI (Airborne Observation of Critical Infrastructures) to deliver a system based on OSM data for rapid damage assessment after earthquakes using a variety of technologies including unmanned aerial vehicles (UAV), machine learning, crowd-sourcing for recording the disaster scene, and exposure models at the building scale. LOKI combines in an interdisciplinary way new technologies with existing expertise in earthquake research and earthquake-engineering knowledge [
31]. In this light, the completeness feature from
MapSwipe aims to increase the resolution of existing exposure models from aggregated exposure information to a detailed building-by-building description, and to identify areas where further mapping effort is required.
This paper is divided into seven sections. The next section presents the MapSwipe data model, followed by a case-study description, covering the four test areas. The subsequent section assesses the robustness of the completeness feature and the derived data as well as a performance analysis of the classification result, which uses reference data that is presented in section five. In the last sections, the results are discussed and final remarks are presented to conclude this paper.
2. MapSwipe Data Model
MapSwipe is a mobile application that was developed within the Missing Maps project in 20142. Generally, the app comprises four important concepts: projects, groups, tasks and results. A project describes a region of interest. Based on the defined region, satellite imagery tiles are requested from a specific imagery provider. While creating the project, the project name, a project image, a zoom level (usually zoom level 18, extending approx. 150m in equatorial areas and about 100m in central Europe), and the number of users that are requested to verify a tile, can be defined. The MapSwipe tasks correspond to the satellite imagery at the specified zoom level. Other parameter such as metadata about the map provider can also be specified.
Regarding the completeness feature, each task is associated with a satellite imagery tile from Bing Maps with a semi-transparent overlay of the OSM building footprints. The mobile app, representing the client, can then request these tasks from the database. In order to enable a fast and efficient communication between the client and the database, groups have been introduced to reduce the amount of client requests. Each group consists of several tasks, which stands for one mapping session.
Results contain information on the user classification. A single classification result comprises information about the task ID, task geometry and tile classification. For the completeness feature project type, volunteers have to classify the completeness of each task into three categories: “no building”, “complete”, “incomplete”. The classification is conducted by tapping on each tile. The main screen of the app is divided into six tasks (c.f.
Figure 1). A tile is considered complete, if the blue colored OSM building footprints cover all the buildings on the satellite imagery. Conversely, if the OSM building footprint does not cover all buildings visible in the imagery, the tile is regarded as incomplete. In case of no buildings being present on the satellite imagery, there is no need to tap and the user can swipe to the next screen, hereby indicating that the tile does not contain any buildings. Additionally, the users are aided in the tasks by a brief tutorial. The results of the volunteers can be obtained from the MapSwipeDev-API (
https://dev.mapswipe.org/api/agg_results/).
6. Discussion
In this study, we analyzed the quality of the crowd-sourced classification of the completeness of OSM building footprints. We showed that the completeness feature in MapSwipe has the potential to produce spatially explicit information about the completeness of OSM building footprints. Factors that influenced the OSM building completeness classification were tasks with a high OSM building density, expressed both by the number of buildings or their footprint area. More buildings or a larger share of the area covered by building footprints distracted the users from a correct “incomplete” classification. After correcting for the correlated error structure, the share of the footprint area lead to a slightly improved model compared to the model based on the number of buildings. Moreover, the classification performance was dependent on how exactly the OSM layer aligned with the satellite imagery. Presumably the currentness of the satellite imagery used in MapSwipe is of importance for the quality of the assessment as well. Unfortunately, image offsets often differ between imagery from different providers. The offset might even vary across the imagery, especially in hilly or mountainous terrain. Using recent imagery might therefore introduce a challenge for volunteers if this introduces an offset between OSM building footprints and imagery. Herfort et al. (2017) have shown that other factors, such as the resolution of the satellite imagery, missing images, as well as presence of clouds, might as well influence the quality of the classification. By successfully testing the approach at four different sites with different building textures, we suggest that the completeness feature in MapSwipe can be applied to most inhabited areas.
A main limitation of this study is the low number of volunteers taking part in the completeness mapping event. It is important to highlight that other authors have shown for OSM that a higher number of volunteers is positively related to the accuracy of the produced data [
42]. Because the answer of each MapSwipe volunteer is also prone to errors, a larger group of volunteers would presumably reduce the overall uncertainty (“wisdom of the crowd”). The same applies to the number of experts. The quality of the classification task clearly depends not only on the properties of the task (such as building density, alignment of OSM and satellite imagery) but also on the experience of the volunteers with such pattern recognition tasks, on the knowledge of potential building types in the area as well as on factors influencing the concentration and motivation of the volunteers [
43,
44,
45] These factors are by design not available for the researcher as MapSwipe does not request personal data from the user.
A further limitation of this study was that incomplete tasks did not provide qualitative information about the number of missing buildings. Therefore, the completeness feature does not provide information about the the share of missing buildings in the incomplete tasks. While it would be possible to extend the MapSwipe completeness tool with respect to additional classes—such as “mostly complete”, “up to 50% complete”, etc.—this would come at the cost of increasing complexity. MapSwipe was designed as a tool that requires only a minimum training effort and that uses a simple and easy-to-learn user interface. Extending the tool with more complex features might reduce its attractiveness for its intended users. The current idea is that MapSwipe is used to identify areas that demand more mapping and that the mapping itself is done in established OSM editors. The amount of missingness of buildings could later on be derived by an analysis of the newly mapped features by tools such as the ohsome [
20].
[
46] proposed a workflow combining deep-learning and crowd-sourcing methods to generate human settlement maps. An extension to this study could be used to perform an automated approach within the incomplete tiles in order to identify the share of missing human settlements automatically. Completely mapped tiles from nearby might be used in this context as a training dataset. As Pisl et al. (2021) have shown it is possible to fine tune pre-trained deep neural networks for building detection based on a relatively small set of additional training data. Furthermore, new products such as the World Settlement Footprint 2015 or similar datasets on the global distribution of built-up areas already relied on crowd-sourcing approaches to assess classification performance and completeness of built-up areas [
48]. In this light, the completeness feature in MapSwipe could be used in future applications to complement automated approaches by generating training as well as validation datasets and could also address specific cases in which automated approaches do not perform well.
Despite the low number of volunteers taking part in the completeness mapping project, this study has shown the characteristics of the data produced by the completeness feature from MapSwipe, which can be useful for exposure models. The misclassifications mostly happened in nearly complete tasks. For exposure modeling, these are of minor importance, since results will only be affected marginally if a few buildings in nearly complete tiles are unmapped. It would have been more problematic, if actually incomplete tiles with a big share of unmapped buildings would have had been considered as “complete”.
In our study, we focused on the completeness of buildings. We can think of many other OSM classes such as land-use features or streets where a similar completeness-task design could be developed. In the domain of land-use and land-cover classification studies that underline the potential of crowd-sourcing approaches for better earth observation already exist [
49,
50]. Further studies are needed to fully comprehend which OSM classes perform well and which OSM classes are too complex. The use of MapSwipe to detect incompletely mapped regions at a small scale is limited to tasks that can be easily detected based on satellite imagery. It is not a silver bullet approach suitable for all types of OSM aspects but complements other approaches such as intrinsic and extrinsic data-quality assessments, incorporation of other Volunteered Geographic Information (VGI) sources such as Twitter [
51] and awareness raising campaigns for mapathons [
52].
Figure 1.
(a) Green-colored tiles representing complete tiles, untapped tiles representing no building tiles; (b) All orange-colored tiles marked as incomplete.
Figure 1.
(a) Green-colored tiles representing complete tiles, untapped tiles representing no building tiles; (b) All orange-colored tiles marked as incomplete.
Figure 2.
Case study locations. The completeness of mapping in OSM differed across and inside the case studies. However, all four case studies contained a large number of OSM features as indicated by the detail maps which were limited here to the most relevant features. Data source: OpenStreetMap contributors under OdbL and Natural Earth (world map). Map tiles for detailed maps by Carto, under CC BY 3.0.
Figure 2.
Case study locations. The completeness of mapping in OSM differed across and inside the case studies. However, all four case studies contained a large number of OSM features as indicated by the detail maps which were limited here to the most relevant features. Data source: OpenStreetMap contributors under OdbL and Natural Earth (world map). Map tiles for detailed maps by Carto, under CC BY 3.0.
Figure 3.
(a) Example of a task with low OSM building coverage; (b) example of a task with almost complete OSM building coverage.
Figure 3.
(a) Example of a task with low OSM building coverage; (b) example of a task with almost complete OSM building coverage.
Figure 4.
Examples for mismatches between volunteer and expert assessment: (a) Tasks predicted as complete, true class is incomplete; (b) Tasks predicted as incomplete, true class is complete; (c) Tasks predicted as no building, true class is incomplete; (d) Tasks predicted as incomplete, true class is no building. Shown are MapSwipe tiles with the OSM building footprints (blue) overlaid.
Figure 4.
Examples for mismatches between volunteer and expert assessment: (a) Tasks predicted as complete, true class is incomplete; (b) Tasks predicted as incomplete, true class is complete; (c) Tasks predicted as no building, true class is incomplete; (d) Tasks predicted as incomplete, true class is no building. Shown are MapSwipe tiles with the OSM building footprints (blue) overlaid.
Figure 5.
Misalignment of the OSM building-footprint layer in Siros.
Figure 5.
Misalignment of the OSM building-footprint layer in Siros.
Table 1.
Characterization of the MapSwipe projects used as case-study sites for the assessment of building completeness. For the average number of buildings and the average building footprint area per task area the standard deviation is provided in parenthesis.
Table 1.
Characterization of the MapSwipe projects used as case-study sites for the assessment of building completeness. For the average number of buildings and the average building footprint area per task area the standard deviation is provided in parenthesis.
Name |
Area [km2] |
Tasks |
OSM Building Coverage |
Number of OSM Buildings per Task [1/ha] |
OSM Building Footprint Area per Task [%] |
Tokyo |
27.5 |
1914 |
Urban area including fully mapped, partly mapped and unmapped areas |
23.6 (24.4) |
21.0 (17.8) |
Taipei |
13.7 |
792 |
Urban area including fully mapped, partly mapped and unmapped areas |
3.6 (5.2) |
11.5 (15.2) |
Siros |
25.0 |
981 |
Island accompanied by smaller patches of agricultural land including fully mapped and partly mapped areas |
7.1 (15.3) |
5.7 (11.4) |
Medellin |
23.1 |
1110 |
Northern part including high building density with almost completely mapped areas, less densely populated southern part consisting of single-family homes with partly mapped areas |
4.8 (8.0) |
13.4 (16.3) |
Total |
89.3 |
4797 |
|
|
|
Table 2.
Classification aggregation schema. Si (x=”no building”) describes the share of users that assigned the label “no building” to task i. Si (x=”incomplete”) and Si (x=”complete”) describe similar the share of users that assigned the label incomplete or complete to task i.
Table 2.
Classification aggregation schema. Si (x=”no building”) describes the share of users that assigned the label “no building” to task i. Si (x=”incomplete”) and Si (x=”complete”) describe similar the share of users that assigned the label incomplete or complete to task i.
Majority Rule |
Criteria |
Aggregated Result |
Clear majority |
Si (x=“no building” ≥ 0.5) |
“no building” |
Si (x=“complete” ≥ 0.5) |
“complete” |
Si (x=“incomplete” ≥ 0.5) |
“incomplete” |
Unclear majority |
Si (“no building”) == Si (“incomplete”) |
“incomplete” |
Si (x=“incomplete”) == Si (x=“complete” ) |
“incomplete” |
Si (x=“no building”) == Si (x=“complete”) |
“incomplete” |
Si (x=“incomplete”) == Si (x=“complete”) == Si (x=“no building”) |
“incomplete” |
Table 3.
Classification performance metrics for the completeness classification task. TP – true positives, TN – true negatives, FN – false negatives, FP – false positives.
Table 3.
Classification performance metrics for the completeness classification task. TP – true positives, TN – true negatives, FN – false negatives, FP – false positives.
|
TP |
TN |
FN |
FP |
Accuracy |
Sensitivity |
Precision |
F1 Score |
Overall performance |
no building complete incomplete |
562 1516 2201 |
4144 2837 2095 |
34 72 412 |
57 372 89 |
0.98 0.91 0.90 |
0.94 0.95 0.84 |
0.91 0.80 0.96 |
0.93 0.87 0.90 |
Table 4.
Confusion matrix of the completeness classification task.
Table 4.
Confusion matrix of the completeness classification task.
Crowd Classification |
Reference dataset |
|
“no building” |
“complete” |
“incomplete” |
Total |
“no building” |
562 |
4 |
30 |
596 |
“complete” |
13 |
1516 |
59 |
1588 |
“incomplete” |
44 |
368 |
2201 |
2613 |
Total |
619 |
1888 |
2290 |
|
Table 5.
Quality measures of the crowd-sourced classification for each site. TP – true positives, TN – true negatives, FN – false negatives, FP – false positives.
Table 5.
Quality measures of the crowd-sourced classification for each site. TP – true positives, TN – true negatives, FN – false negatives, FP – false positives.
|
|
TP |
TN |
FN |
FP |
Accuracy |
Sensitivity |
Precision |
F1 Score |
Siros |
no buildings complete incomplete |
318 447 108 |
634 448 772 |
13 24 71 |
16 62 30 |
0.97 0.91 0.90 |
0.96 0.95 0.60 |
0.95 0.88 0.78 |
0.96 0.91 0.68 |
Medellin |
no building complete incomplete |
52 225 755 |
1049 813 280 |
3 15 60 |
6 57 15 |
0.99 0.94 0.93 |
0.95 0.94 0.93 |
0.90 0.80 0.98 |
0.92 0.86 0.95 |
Taipei |
no building complete incomplete |
117 219 373 |
644 517 340 |
15 11 57 |
16 45 22 |
0.96 0.93 0.90 |
0.89 0.95 0.87 |
0.88 0.83 0.94 |
0.88 0.89 0.90 |
Tokyo |
no building complete incomplete |
75 625 963 |
1815 1057 703 |
3 22 224 |
19 208 22 |
0.98 0.88 0.87 |
0.96 0.97 0.81 |
0.80 0.75 0.98 |
0.87 0.84 0.89 |
Table 6.
Fixed and random effects for the logistic GLMM regression model for the identification of factors influencing the correctness of the classification for “incomplete” tasks. The coefficients belong to two single predictor models. Coefficients, confidence intervals (CI) and standard errors are reported at the link scale.
Table 6.
Fixed and random effects for the logistic GLMM regression model for the identification of factors influencing the correctness of the classification for “incomplete” tasks. The coefficients belong to two single predictor models. Coefficients, confidence intervals (CI) and standard errors are reported at the link scale.
|
Coefficient |
Std. Error |
95% CI |
z-value |
p-value |
GLMM using building area share as predictor |
Intercept |
2.73 |
0.75 |
[0.83, 4.65] |
3.62 |
0.00029 |
OSM building area [%] |
-9.11 |
0.54 |
[-10.19, -8.07] |
-16.83 |
< 2*10-16
|
AIC: 1341.0, BIC: 1357.6 Random intercept: σ2 = 2.20 (95% CI = [0.82–3.67])R²GLMM(m)= 0.24, R²GLMM(c)= 0.55 |
GLMM using buildings per area as predictor |
Intercept |
2.05 |
0.55 |
[0.65, 3.45] |
3.71 |
0.00021 |
OSM building count [1/sqm] |
-744.9 |
42.4 |
[-845.68, -649.57] |
-17.57 |
< 2*10-16
|
AIC: 1398.1, BIC: 1414.6Random intercept: σ2 = 1.19 (95% CI = [0.60–2.70]) R²GLMM(m)= 0.26, R²GLMM(c)= 0.46 |