1. Introduction
Urbanisation has meant that three-quarters of Europe’s population now lives in cities. Citizens are constantly confronted with levels of air pollution that violate the safe thresholds for human health defined by the World Health Organisation (WHO) [
1], generally caused by the natural dynamics of the movement of people and the pollution associated with such transport.
According to Eurostat [
2], in 2021, there were 369,000 deaths in the EU resulting from diseases of the respiratory system, equivalent to 7.9 % of all deaths in the EU-28. The BBC reported [
3] that "around 422,000 people died prematurely in European countries in 2018 due to exposure to harmful levels of fine particulate matter PM2.5". Moreover, the problem is worse if we consider that a large part of the population has or may have some kind of allergy, respiratory, and skin problems [
4]. And there is an increasing number of allergens that increase allergic problems, asthma, as well as other respiratory and skin problems [
5].
In this scenario, Wireless Sensor Networks (WSN) for monitoring Air Quality (AQ) based on low-cost sensors and supported by 5G technologies, together with Artificial Intelligence (AI) techniques [
6,
7], along with official AQ monitoring stations, can help citizens in their day-to-day lives by means of a system that looks after their health when they are on the move, especially when they have respiratory and/or allergic problems.
To this end, the goal of this paper is to propose an inclusive and intelligent routing ecosystem with the objective of calculating healthy routes according to the profile and particular needs of each citizen (which include pathologies and clinical history) in their outdoor movements, assisted by a real-time AQ monitoring network within an Internet of Things (IoT) paradigm. These activities are carried out within the ECO4RUPA project.
Figure 1.
Official Air quality (AQ) monitoring network in Valencia city [
8] and surroundings.
Figure 1.
Official Air quality (AQ) monitoring network in Valencia city [
8] and surroundings.
Notice that at the international level, the air quality is ruled by ISO 11771:2010 [
10] and ISO 37122:2019 [
11] according to the European Regulation Directive 2008/50/EC [
12], cities with more than 2 million inhabitants must have at least one monitoring station for AQ. Thus, this monitoring network is supported by publicly available data from official AQ monitoring stations for polluting gases (the network of stations of the Generalitat Valenciana [
9]) as well as other stations managed by the local councils, such as in Valencia city [
8]. In
Figure 2a, we shown an example of Official AQ monitoring station, in particular from Burjassot (outskirts of Valencia, Spain). All this information gathered is also improved with statistical techniques of spatial inference to enhance the spatial resolution of these pollutants over the city maps. Besides, in zone with poor official AQ coverage, we deploy additionally ECO4RUPA AQ monitoring nodes, as shown in
Figure 2b,c, outdoor and indoor versions respectively.
The AQ Index scale is based on the US-EPA 2016 standard and it is classified into 6 categories, given by different ranges and colours, as follows: range [0 - 50] as good (green), [51 -100] as moderate (yellow), [101-150] as unhealthy for sensitive groups (orange), [151-200] as unhealthy (red), [201-300] as very unhealthy (purple) and more than 300 as hazardous (dark red). In
Figure 1, we show a map with these official AQ monitoring stations, along with an indicator of the AQ index. In this case, all the different stations are reasonably good by the time they were queried. Also, in
Figure 3, we show a at a higher scale the AQ index for the Valencia community region [
13]. It must be stressed that in this
Figure 3, the pollution is mainly due to ozone,
, which is a secondary pollutant derived from the combustion of fossil fuels.
The rest of the paper is structured as follows. In
Section 2, we show available AQ sensors and the related work. In
Section 3, we analyse the different design alternatives to be used in the broad monitoring network and its architecture. In
Section 4, we consider the options to integrate and merge the information obtained for the route planner. In
Section 5, we present and discuss the results with different users’ profile. Finally, in
Section 6, we summarise the main conclusions and future work.
2. State of the art
With regard to AQ monitoring, it should be noted that there are 3 distinct areas based on the different types of gases. That is, greenhouse gases (under control by emissions monitoring), chlorofluorinated gases (analysed in the upper layers of the atmosphere), and pollutants, which include Nitrogen Dioxide (), Sulfur Dioxide (), Carbon Monoxide (), Ozone (), as well as benzols and heavy metals (lead (), Arsenic , Cadmium ()). From these areas, the most relevant for citizens is the last one, the pollutants, most of which come from the combustion of fossil fuels in the city and for which there are regulations and standards for their control, such as Directive 2008/50/EC.
With concern to pollutants, the recent boom in low-cost AQ sensors, due to their ease of installation and low power consumption, makes them increasingly used and interesting to integrate into WSN. These sensors can measure pollutants, such as the ones mentioned before, as well as Volatile Organic Compounds (VOC, usually measured in totals, TVOC), Particulate Matter (PM) concentration or particle size distribution, along with Temperature (T), Atmospheric Pressure (AP) and Relative Humidity (RH). Depending on their operating principle, these sensors are available in different technologies to react to the presence of the pollutant such as electrochemical, metal oxide semiconductors, photo ionisation detectors, non-dispersive infrared, and light scattering, among others.
Manufacturers also integrate different sensors in the same module which makes them easier to be used and more attractive. A list of these types of sensors (or sensor modules) and their main characteristics, in particular the type of gases measured as well as the type of data connection, are shown in
Table 1. From all of them, the one we consider to have the best performance, the largest number of gases, and the best quality/price ratio is ZPHS01B [
17]. Besides, we can highlight different commercial initiatives [
18,
19] for AQ monitoring, also considered as low cost, based on a network system that auto-calibrates the AQ measurements.
However, with reference to the measuring ranges and measurement quality of low-cost AQ sensors, the recent CEN/TS 17660-1:2021 standard has set the criteria established by Directive 2008/50/EC for the equivalence of sensor systems used outdoors with the instruments for indicative measurements and objective estimations. In this scenario, these sensors have many limitations as they do not provide a reliable absolute measurement and therefore cannot be used as a substitute for a reliable absolute measurement, nor as a substitute for a reliable reference [
20]. In practice, these sensors can be used to provide an order of magnitude and/or awareness of AQ and to allow the identification of pollution hotspots. Nevertheless, to increase the reliability of the readings, the measurements of these sensors can be used as input to the modeling procedure, assisted with AI techniques [
6,
7] and together with other data, typically measurements of other pollutants and ambient conditions (T and RH).
Furthermore, if we take into account the pollution information in the city, we can plan and influence the calculation of routes for citizens, also known as a route planner according to the particular citizen’s profile and needs. A route planner is a specialised search algorithm designed to find the optimal way to travel between two or more specific locations, trying to minimize a determined cost function. In [
21], a routing application is introduced that calculates the least polluted route through the streets. The authors employ a modified version of the popular
Ant Based Control routing algorithm as the basis for their routing algorithm. To incorporate pollution data and minimize travel time, the authors tackle a multi-parameter problem.
Similarly, in [
22], the significant health risks associated with air pollution are emphasized, with AQ being influenced by factors such as time of day, location within the city, and traffic intensity. To forecast AQ over time, the authors devise a meteorological model integrated into the Healthy Urban Route Planner (HURP), specifically designed for cyclists and pedestrians in Amsterdam (Netherlands). HURP enables users to select and plan a route that promotes a healthier environment, utilizing information gathered from various systems. Traffic emissions are computed based on observed traffic intensities and emission factors. The authors utilize the
WRF-Chem atmosphere and AQ model, which generates daily forecasts within a 48-hour, providing temperature and pollutant concentration forecast maps. These maps are then transformed into a unique metric that combines both factors. Hourly data of this metric is incorporated into the route planner, which employs the open source routing library
pgRouting1 to identify healthier routes. Also, researchers from the National Institute for Public Health and the Environment in the Netherlands (RIVM) have developed the Atlas Living Environment [
23]. By utilizing location-specific parameters, they generate maps displaying the local environment, particularly focusing on
,
,
, and
densities. These maps are derived from real-time measurements and prediction models. Additionally, the authors have developed an application that forecasts the AQ index for the next 48 hours.
In this line, in [
24], it is introduced a monitoring system that utilizes a mobile network implemented on Android devices to provide real-time air pollution information to users. The pollution data collected from various sources is stored on a cloud-based server, facilitating real-time analysis and the development of an air pollution model. To measure air pollution levels, eco-sensors are deployed on public transport systems or bicycles. However, low-end sensors often suffer from reduced accuracy compared to more advanced sensors as mentioned above. In [
25], a system for air pollution monitoring in Mauritius Island is shown, featuring a novel data aggregation algorithm specifically designed for air pollution monitoring systems. In [
26], a dynamic routing was carried out using data from a set of pollutant particles of particulate pollutants considered PM10. The researchers used Open Source Routing Machine (OSRM) to perform the routing. Finally, in [
27], the authors also explore the integration of air pollution data with route planning. But they propose alternative planning algorithms that aim to distribute traffic more evenly across urban areas. The authors demonstrate that such algorithms not only help alleviate traffic congestion but also contribute to reducing overall air pollution levels in urban environments.
Also, we can highlight several commercial applications known as route planners, such as Google maps [
28], Ants Route [
29], and Here [
30] to name a few. Nevertheless, we must stress that these applications are focused mainly on driving and based on the shortest distance.
In summary from the related work, we can see that there are several initiatives to improve mobility and route planners with different strategies, but not focused on the user’s profiles and his/her needs for healthy routes. Thus, this is the goal of this paper.
3. Design alternatives and techniques for a broad AQ monitoring network and its architecture
For this purpose, the first step is to design and build the AQ monitoring network based on low-cost elements, adjusted with official AQ monitoring data. This network will be set up with IoT nodes based on a microcontroller that connects to different low-cost AQ sensors, seen in
Section 2, with the option of different communication alternatives as shown in
Figure 4. In case of failure, these IoT nodes incorporate a real-time clock, a memory card, and a watchdog mechanism for its recovery.
We have initially selected the ESP32 microcontroller [
31], due to its performance and quality/price, as it offers in each model the possibility of having different antennas, as well as the possibility to implement different communication standards. The ESP32 is a series of low-cost, low-power system on a chip microcontrollers that embeds several communication modules. Based on this microcontroller, it is worth mentioning the Pycom’s
2 FiPy module [
32], which includes technologies such as Lora/Sigfox, WiFi, Bluetooth, and cellular technologies such as Long Term Evolution (LTE) for machines (LTE-M) and Narrow Band IoT (NB-IoT). Notice that this FiPy module is flexible enough and permits building this type of IoT nodes, shown in
Figure 4.
In particular, in
Figure 5 it is depicted a hardware prototype of the implemented AQ IoT monitoring node, with the connection of the ESP32 microcontroller to the ZPHS01B AQ sensor module. Figure 2b,c show the indoor case (for tunnels and indoor environments) for this node that includes a tube and a fan to make air flow pass through the sensor board and also the outdoor version, where the air intake is at the bottom of the tube that sucks it in through also with a small fan. Notice as we mentioned before, that these IoT nodes are used as a coarse reference of the AQ, compared with the measurements provided by the official AQ monitoring stations. These direct measurements taken from these low-cost sensors, in order to be considered valid, are processed by AI-based algorithms to correct and adapt the measurements to reliable values [
6]. This process is out of the scope of this paper since we focus only on how to calculate healthy routes according to the particular citizen’s profile and needs.
The communication scheme of the IoT node with the infrastructure is detailed in
Figure 6. It is based on the IoT Message Queue Telemetry Transport (MQTT) protocol, which transmits information via messages between the nodes and the MQTT broker. It should be noted that MQTT allows 3 levels of Quality of Service (QoS) to verify the delivery of messages and also several security mechanisms regarding the transmitted data. We have chosen the highest QoS level, QoS-2, which guarantees the delivery of messages only once, without loss or duplication. In terms of security, we use username and password-based login, both at the broker and at the clients, and SSL-certified encryption for transmitted data. The data received is stored locally in a database. For the publishing process, nodes can create a new topic by simply publishing to it, so that more nodes can be added to the IoT system, which greatly facilitates the scalability of the system. This data can also be stored in the cloud, providing additional backup and security against data loss. To graphically visualise the data, the geographical positions of the nodes are indicated.
Notice that the placement of these ECO4RUPA low-cost AQ monitoring nodes will improve the coverage given by the official AQ monitoring stations as mentioned before, following a criteria explained in
Section 4.2.
4. Data fusion, spatial interpolation, and route planner application
This section describes the core of the healthy router planner. The goal is to calculate healthy walking and/or cycling routes according to the particular citizen’s profile and needs. For the development of this service, its flowchart is shown in
Figure 7. In this case, initially, the user launches a request for a route calculation. With this, his/her user profile is analysed, and based on it, the appropriate variables (specific pollutants) will be considered, performing a complete interpolation in the area of interest defined by the search using Kriging technique. Later, these values are superimposed on the geographical map and define the metric to be minimised in the route search.
4.1. Analysis of the user’s profile: weighting pollutants
Based on the users’ requirement (specified within his/her profile), we will estimate the pollution according to it, as a combination of the different parameters (pollutants) by weighting their different measurements in the area of user mobility. In particular, as a proof of concept, we have considered citizens with asthma and pregnant women without lack of generality. In this case, we will use the following weights for the different pollutants according to the literature.
In case of asthma, we assign 40% for ozone,
, (
), 10% for
(
) and 50% for
. These weights are assigned because
is considered one of the most dangerous pollutants, as it can penetrate the lungs and cause various health problems [
33]. Furthermore,
is an oxidant known to irritate the airways and has a clearly defined effect on asthma exacerbation [
34]. Also, according to [
34,
35], the results of a meta-analysis, there is evidence to support the link between increased ozone concentration and
, worsening asthma.
In case of pregnant women, we assign 5% for
(
), 35% for
(
), 20% for
, 30% for
and 10% for
. These gases are chosen because according to [
36], exposure to
,
and
, is associated with a reduction in neonatal weight and exposure to
and
has been related to an increased risk of premature birth. In addition,
adds delay in the development of children’s attention span, according to a study conducted by [
37]. We assign 30% for
, because exposure to this gas can directly affect the fetus through oxygen deficiency [
38], which can cause brain damage, developmental delay, and complications during pregnancy [
39]. For
and
, we assign a weight of 20% for
and 10% for
, because it has been associated with complications during pregnancy, such as premature birth, low birth weight, and respiratory problems in the fetus. Finally, we assign only 5% for
, because we have found studies that say that
exposure increases the risk of premature birth, low birth weight, and respiratory problems in the fetus, but others do not confirm this relation [
38].
Notice that in practice, these weights will be personalized based on the end user’s requirements, and even, they could be saved within his/her profile.
4.2. Kriging for spatial interpolation of pollution
Since the spatial sampling is still limited to the spots where the IoT nodes are deployed and/or the official AQ monitoring stations are installed for a real-time map of the pollutants, a spatial interpolation technique is required, because it is necessary accurate pollution measurements at the different points over the city map in order to analyse the different paths for the routes.
Kriging [
40] is a spatial statistical technique that allows the analysis of geolocated information and is based on spatial autocorrelation, unlike other techniques such as Inverse distance weighting (IDW) and Splines [
40,
41]. The main idea with Kriging is that the estimated variable is given by a deterministic (without spatial influence) part and a random (with spatial influence) part. In this case, Kriging employs the spatial function from the random section in order to deliver the best linear unbiased estimator. Thus, the information gathered from the IoT nodes establishes a dataset associated to different locations with their coordinates, longitude, and latitude, as a first step to applying the Kriging technique
For Kriging, lets D denote a region of interest within a map, where . Within the region D, we want to measure some variable z. Let denote the random variable that can be measured at location s in the region. In practice, measurements are obtained at a finite number n of points:
In this case, for geostatistical data, the covariance functions for the response
z at two different points
and
depend only on the difference in locations (distance and direction) between the two points, and it is given by Equation
1,
for some function
c. The key of Kriging is to characterise the random section of the estimated variable via the variogram function, which represents an index of change of variable with respect to the distance. The function
is called
Variogram and function
is the
Semivariogram [
42] given by:
Assuming
and
are constant throughout a specific region, that is considering a second-order and intrinsically stationary spatial process, then the variogram for an isotropic process in terms of auto-correlation function (
) between two spatial points can be written as
with
h denote the distance (or
lag) between the two points
and
. In terms of auto-covariance function, the variogram function becomes
From the previous expression, it is easy to see that when
h becomes larger, and the variogram converges to
.
Figure 8 shows a theoretical variogram. As the distance
h gets larger, the variogram values increase indicating that as points get farther apart, the expected difference between the measured values at those two points increases as well.
The variogram can be described based on the following attributes, as depicted in
Figure 8:
The Sill: corresponds to the maximum height of the variogram curve. As h gets large, the correlation (and hence covariance) between the measured values at two points separated by a distance h become independent.
The Range: is the distance h such that pairs of sites further than this distance apart are negligibly correlated. The range of influence is sometimes defined as the point at which the curve is of the difference between the nugget and the sill.
The Nugget Effect: is expected that , i.e. should be equal to zero if . However, this is usually not the case. As the distance h goes to zero, there is a nugget effect due to measurement error and micro-scale variation.
Thus, as a first approach, the variogram is estimated based on the measured values taken in different locations inside region D, and called empirical variogram. But, in order to be useful, this empirical variogram needs to be replaced later by a model. Some common models are spherical, exponential, gaussian, and power. It is worth mentioning that in this step we have room to move and improve the placement of the ECO4RUPA AQ monitoring nodes in order to enhance this variogram.
Notice that the
classical method of estimating the variogram, which corresponds to the method of moments estimator, is given by [
42]:
where
is the set of all distinct pairs of points (
,
) such that
. Note that the data is smoothed to generate an estimate of the variogram. For instance, the data can be partitioned into groups, where observations in particular groups are within a certain range of distances, and then using the average squared difference of the points in each group are replaced in Equation
5.
Finally, using linear interpolation, we can estimate value
at location
based on
N measurements
as,
where
are the weights applied to
and
is a constant. The estimation error
is unknown but must be unbiased and its variance is derived from the input variogram model. Thus, the goal of Kriging is to find the weights
for the linear estimator that minimizes the estimation variance. Kriging is a linear least square estimation algorithm, thus the weights of the sampled values are obtained through the minimisation of the estimation variance (error).
In addition, based on the treatment of the mean value of the stochastic field, there are different types of Kriging: Simple Kriging (SK), Ordinary Kriging (OK), Universal Kriging (UK), Lognormal Kriging, etc. In particular, the main ones are SK and OK. They differ mainly in the use of the samples’ mean. The mean value of the Simple Kriging is assumed to be known and constant for the entire domain. Unlike Simple Kriging, Ordinary Kriging considers the mean value unknown, constant, and equal to only for and the n points that enter into the estimation of , giving a more realistic approach to the estimation process.
4.3. Mapping of pollution over the grid on the city
Once we define the pollution for each user and it has been interpolated with OK over the city map on the area of the user mobility, we need to map this pollution over the grid of the city.
For this, the street network within a city is represented as a bidirectional graph, where nodes correspond to intersections and edges represent street segments. Each edge is associated with a parameter that signifies the cost of travelling along that particular segment. Common routing algorithms aim to minimise the total cost of a route by considering the cumulative costs along the edges, known as the cost function in this case given by the exposure to pollution for each user.
Notice that to do this, we have to assign the pollution at each point over the city map by using a grid of 0.0001 decimal degrees that corresponds to 11.5 meters (since 360º corresponds to the whole perimeter, 40075 km, of the Earth). It is necessary to find the match between each graph node and its corresponding grid point, and then assign the pollution value of the grid point to a new attribute of the graph node. In particular, the grid dataframe index is defined as a combination of the longitude and latitude coordinates, and a search over the grid for the corresponding point using these coordinates as indices.
For this, we have used OpenStreetMap (OSM) [
43], which is a collaborative project for the creation of editable and free maps. We can find different libraries and tools such as
OSMnx [
44] for Python, which will allow us to analyse these maps in a coherent way.
Once the pollution values have been assigned to the nodes of the graph, the next step is to assign weights to the edges of the graph to convert it into a weighted graph. To determine the weights of the edges, the criterion based on the weighted average between the nodes that connect each edge will be used, considering both the pollution values, registered at that specific time and the distance between the nodes. This means that the weight of an edge will be the average of the pollution measurement values of the nodes connecting that edge, multiplied by the distance between the two nodes. This is because the travel time between two nodes is assumed to be directly proportional to the distance. In addition, the pollution to which people are exposed is proportional to the exposure time. Therefore, the pollution value and the length of the street segment are multiplied to obtain the cost function.
4.4. Healthy route planner
With the weighted graph in the area of user mobility with the pollution weights of the edges, we can proceed to find the path that minimises overall exposure to pollution.
To do this, we identify the node closest to our departure location and the node closest to our arrival location. The widely-used Dijkstra algorithm (or shortest path first) utilises the edge lengths as the cost function. Then, using the functions provided by the OSMnx library, we construct the optimal route to reach the destination.
Author Contributions
Conceptualization, J.S.G, S.F.C. and J.M.A.C.; methodology, R.F.J., J.M.A.C., J.S.G. and S.F.C.; software, R.A.C., R.F.J. and J.S.G.; validation, R.A.C., J.J.P.S., S.F.C.; investigation, J.J.P.S., S.F.C. and J.S.G,; resources, J.S.G. and S.F.C.; writing—original draft preparation, R.F.J., S.F.C and J.S.G.; writing—review and editing, J.J.P.S., R.A.C., and J.M.A.C. All authors have read and agreed to the published version of the manuscript.