Preprint
Article

Long-Term Assessment of PurpleAir Low-Cost Sensor for PM2.5 in California, USA

Altmetrics

Downloads

85

Views

67

Comments

0

A peer-reviewed article of this preprint also exists.

This version is not peer-reviewed

Submitted:

30 June 2023

Posted:

04 July 2023

You are already at the latest version

Alerts
Abstract
Regulatory monitoring networks are often too sparse to support community-scale PM2.5 exposure assessment while emerging low-cost sensors have the potential to fill in the gaps. Recent advances in air quality monitoring have produced portable, easy-to use, low-cost, sensor-based monitors, which has given a new dimension to the air pollutant monitoring and has democratized the air quality monitoring process by making monitors and results directly available at community level. This study used PurpleAir(c) sensors for PM2.5 assessment in California, USA. Evaluation of PM2.5 from sensors included Quality Assurance & Quality Control (QAQC) procedures, assessment with respect to reference monitored PM2.5 concentrations, and formulation of a decision support system integrating these observations using geostatistical techniques. The hourly and daily average observed PM2.5 concentrations from PurpleAir monitors followed the trends of observed PM2.5 at regulatory monitors. PurpleAir monitored PM2.5 also captured the peak PM2.5 concentrations due to incidents like forest fire. In comparison with reference monitored PM2.5 levels, it was found that PurpleAir PM2.5 concentrations were mostly higher. The most important reason for PurpleAir higher PM2.5 concentrations was the inclusion of moisture or water vapor as aerosol in contrast to measurements of PM2.5 excluding water content in FEM/FRM and non-FEM/FRM monitors. On long term assessment (2016-2020), R2 was between 0.54 and 0.86 at selected collocated PurpleAir and regulatory monitors for hourly PM2.5 concentrations. Past research studies have been conducted for mostly shorter time periods (<3-4 months) that resulted in higher R2 values between 0.80 to 0.98. This study aims to provide reasonable estimations of PM2.5 concentrations with high spatiotemporal resolutions based on statistical models using PurpleAir measurements. The methods of Kriging and IDW, geostatistical interpolation techniques, showed similar spatio-temporal patterns. Overall, this study revealed that low-cost, sensor based PurpleAir sensors could be effective and reliable tools for episodic and long-term ambient air quality monitoring.
Keywords: 
Subject: Environmental and Earth Sciences  -   Pollution

1. Introduction

Epidemiological studies have long established the impact of fine aerosols on human health worldwide [1,2]. PM2.5 refers to the atmospheric particulate matter (PM) that has an aerodynamic diameter equal to or less than 2.5 micrometers, which is about 3% the diameter of a human hair [3]. Exposure to higher PM2.5 concentration is a greater threat to human health due to their higher levels of toxicity, stronger tendency towards deposition deep in the lungs, and longer lifetime in the lungs [2] linked to increase in morbidity and mortality [3] and even central nervous system [5]. Influence of fine ambient aerosol concentrations may be seasonal or episodic, with higher concentrations during winter. The emission sources of ambient PM2.5 can be both natural: volcanic dust, windblown dust, sea salt etc. and anthropogenic and may be both local and regional since PM2.5 may be transported over long distances [6]. Thus PM2.5 issue is a deeply critical matter and emission sources maybe even global generating local air pollution and needs to be addressed in depth at both regional and local scales.
Hitherto PM2.5 attainment demonstrations and exposure assessments have used PM2.5 concentration data from regulatory monitoring networks under the assumption that PM2.5 concentrations measured at fixed observations sites reasonably reflect ambient air PM2.5 concentrations in surrounding areas. However, research studies such as [7,8] have established that the spatial resolution of PM2.5 concentrations may vary significantly within a region; therefore, PM2.5 concentration observed at regulatory sites may not accurately represent the PM2.5 concentrations present near people who are concerned about their possible health effects.
The monitoring methods and procedures promulgated by United States Environmental Protection Agency (U.S. EPA) called Federal Reference Methods (FRMs) and Federal Equivalent Methods (FEMs) are used by all States and other monitoring organizations to measure outdoor air pollutants accurately and reliably for evaluation of implementation of measures needed to attain National Ambient Air Quality Standards (NAAQS) [9]. These regulatory monitoring networks are often too sparse to support community-scale PM2.5 exposure and air quality assessments especially when communities are impacted by events like wildfires [10]. Often the sparse regulatory monitors networks result in poor statistical air quality and exposure assessments. Recently emerging low-cost sensors enable individuals to monitor air quality on finer spatial and temporal resolutions of PM concentrations in local and regional areas [10,11,12,13,14]. Low-cost PM2.5 sensors have a potential to fill in the gaps of regulatory monitoring networks and might overcome the limitations and improve the statistical assessments [15].
Recent advances in air quality monitoring have produced portable, easy-to use, low-cost, sensor-based monitors. It has given a new dimension to the air pollutant monitoring and has democratized the air quality monitoring process by making monitors and results directly available at the community level in a cost-effective way (Farooqui et al., 2021). Sensor monitors can provide rich data for urban pollution monitoring at high spatio-temporal levels that may be used for regulating air quality [16]. Low-cost sensors are useful for assessment of air quality models, on finer scales as required for urban air quality [12]. One such low-cost, sensor-based extensive monitoring network PurpleAir © (https://www2.purpleair.com/) provides PM2.5 data to the public. It has over 10,000 monitors worldwide with a growth rate of ~30 per day [13]. In 2020, California State has over ~8,000 such active monitors as shown in Figure 1. These monitor’s sensor counts suspended particles in sizes of 0.3, 0.5, 1.0, 2.5, 5.0, and 10 µm. Particle counts are processed by the sensors using a complex algorithm to calculate the PM1.0, PM2.5, and PM10 mass concentration in µg/m3 [17].
There are limited studies and only recent ones [18,19,20] that have focused on evaluation of low-cost PM2.5 monitors with regulatory monitored PM2.5 concentrations [21,22,23,24]. Although sensor-based monitors are in developing stage, these monitors are performing better for measuring PM than gaseous pollutants. Besides, that these studies assessed the low-cost sensors for short-term period between one to four months. The degradation of the sensor and seasonal variability affecting the PM2.5 has not been studied much for longer time periods. This study was focused to assess long-term assessment of low-cost sensor monitors since their deployment and particularly uses PurpleAir monitored PM2.5 for its assessment. Long-term assessment of PurpleAir monitored PM2.5 will address the response of the sensor to varying meteorology (Relative humidity and temperature).
The reliability of the PurpleAir PM2.5 monitored concentrations with respect to reference monitor is still a question. Research studies have addressed the accuracy and precision issues related to sensor based PM2.5 concentrations [22,24,25] highlighting the bias associated with sensor based PM2.5 levels. Since there are no established best procedures, practices, and guidelines on operation and maintenance available for these monitors, it becomes essential to conduct quality assurance and quality control (QA/QC) of the datasets before its application in fields of air quality assessments and integrated air quality decision support systems.
The aim of this study was to assess long-term PurpleAir PM2.5 sensors with reference monitored PM2.5 concentrations at selected sites across California State (Figure 1) and formulate a decision support system integrating these observations using geostatistical techniques. Geostatistical interpolation techniques such as Inverse Distance Weight (IDW) [26] and Kriging [27] were applied to PurpleAir PM2.5 concentrations to assess if these sensors can fill in the gaps of regulatory monitors. The geostatistically predicted and observed PM2.5 concentrations were qualitatively and quantitatively evaluated. This study was aimed to deepen understanding of behavior of PurpleAir PM2.5 sensors over longer time periods and assess if they provide reasonable estimations of PM2.5 concentrations with high spatio-temporal resolutions over extended time periods. The sensor data was then integrated with data from reference monitors to understand the spatial distribution of PM2.5 concentrations over state of California. Beyond evaluating sensor performance through different types of statistical correlations with reference monitors, this study also investigates the degree to which data from sensors can reproduce similar temporal patterns and episodic events such as wildfires long-term in comparison to high resolution reference monitors.

2. Methodology

2.1. PurpleAir PM2.5 QA/QC

PurpleAir PM2.5 5-minute data were downloaded from the very first data record in August, 2016 till December 31, 2023 from https://www2.purpleair.com/ for entire California State and neighbor States. Only for collocated sensors data were updated till May 14, 2023. The dataset was raw and without any correction adjustment. Therefore, quality assurance and quality check (QA/QC) routines of the data were developed and performed. PurpleAir monitors consist of two sensors for PM2.5 channels A and B. Data is stored and transmitted though these channels which provide measures for quality control of the data. Therefore, data in this study was cleaned and considered valid if the differences between channels A and B were substantiated as discussed below. The 5-min averaged data for the years (2016-2023) were downloaded from online sensors, then processed using python script and analyzed. Atmospheric PM2.5 variable labeled as “pm2_5_atm” was used in this work. The three criteria used for QA/QC of Purple Air PM2.5 for 5-minute PurpleAir PM2.5 data in case of all sensor monitors were as follows:
  • 5-minute PurpleAir PM2.5 for all monitors
    • for PurpleAir PM2.5 ≤ 0.3 µg/m3 : Invalid
    • for PurpleAir PM2.5 between > 0.3 and ≤ 100 µg/m3 : if difference between Channel A and B within ± 10 µg/m3 : Valid
    • for PurpleAir PM2.5 > 100 µg/m3 : if difference between Channel A and B within ± 10 % : Valid
    • for PurpleAir PM2.5 > 500 µg/m3 : Invalid
  • Hourly average calculated with only valid 5-minute data.
  • Daily average calculated with only valid Hourly averages with number of data availability for hours in a day ≥ 20 considered as valid.
Raw data inherits some peculiar challenges so the PurpleAir PM2.5. PurpleAir monitors were also installed indoor, for few of the PurpleAir monitors, the location label ‘outdoor’ and ‘indoor’ were missing. For the monitors missing the location label, the tests below were performed and labelled accordingly.
  • Daily minimum and maximum temperature ‘temp_f’ were calculated from average hourly data.
  • Difference between daily maximum and daily minimum temperature were calculated
  • Number of days with daily difference ‘temp_f’ of > 10 F and ≤ 10 F were counted.
  • For monitors with ‘number of days (daily difference) > 10 F’ greater than the ‘number of days (daily difference) ≤ 10 F’ were not considered.
Besides that, another challenge was as of now, particle count to mass conversion algorithm, which is not available to the public; and identity or ‘id’ number of the monitor remains the same with change in location or geo-coordinates. This happens when for some reasons a monitor was moved from one corner of the building to another corner and/or from one building to another. After performing QA/QC on PurpleAir PM2.5 concentrations, only valid data were used in this analysis. As of now, over 8,000 outdoor PurpleAir monitors are in all counties across California State as shown in Figure 1. Some sites had over 5 years of data, while others had data from a single week or season.

2.2. Geostatistical Interpolation

Two geo-statistical techniques: Inverse Distance Weighting (IDW) and Kriging methods were used to estimate PM2.5 concentrations at monitored and un-monitored locations. These two methods are explained below in brief. Figure 2 shows the flow diagram of the work in the study. Daily average PurpleAir PM2.5 were used in Kriging and IDW to interpolate PM2.5 concentrations across California State. The interpolated PM2.5 were extracted at few selective FEM/FRM and on-FEM/FRM sites across California. Later interpolated PM2.5 concentrations were evaluated with observed daily average PM2.5 from FEM/FRM and non-FEM/FRM available from U.S. EPA AQS system [28].

2.2.1. Kriging

Kriging is a geostatistical tool used for interpolation for which the interpolated values are modelled by a Gaussian process governed by prior covariances. Under suitable assumptions Kriging gives the best linear unbiased prediction of the intermediate values. The method is widely used in the domain of spatial analysis and computer experiments. Kriging determines spatial structure of outputs with proven inputs represented by variogram/semi-variogram analysis which is the variance/half variance of the difference between input data and represents measure of association in geo-statistics [29]. To relate PurpleAir PM2.5 to regulatory monitored PM2.5, Kriging tool was used with PurpleAir monitored daily averaged PM2.5 to estimate PM2.5 concentrations at regulatory monitored PM2.5 site. Daily average PurpleAir PM2.5 was calculated from hourly average PM2.5 concentrations as described earlier. The Kriged PM2.5 concentrations at few regulatory monitors were extracted and evaluated with observed PM2.5.

2.2.2. Inverse Distance Weight

Inverse Distance Weight is a deterministic way of finding concentrations at unmonitored locations using PurpleAir PM2.5 concentrations at the point of interest of regulatory monitors. The assigned concentrations to regulatory monitor were calculated with a weighted average of the PurpleAir PM2.5 available at the known points. The name given to this type of methods was motivated by the weighted average applied, since it resorts to the inverse of the distance to each known point ("amount of proximity") when assigning weights. Formula to estimated concentration is:
P E s t . = i = 1 n P i d i p i = 1 n 1 d i p
where, P E s t . is the estimated concentration at regulatory monitor, d i p is the distance from unmonitored location to the i monitored concentrations points to the power of p, P i is the concentrations at i monitored locations. The better accuracy is achieved by the power p equals to 2. Due to sparse network of existing air quality monitors maximum observed data points n is set to five. Nearest five PurpleAir monitors were identified at the regulatory monitoring sites for each day.

3. Results and Discussions

3.1. Observed PurpleAir and Regulatory PM2.5

It was imperative to evaluate and validate PurpleAir PM2.5 with observed PM2.5 at regulatory sites. Figure 1 also shows FEM/FRM and non-FEM/FRM monitored PM2.5 sites in California during 2016 and 2023. Details of these sites are in Table 1, , with AQS ID, site name, PurpleAir monitor ID, and dates of monitoring. It also shows approximate distance calculated between PurpleAir monitor and regulatory site. Regulatory sites data were downloaded from EPA AQS Datamart from 2016 to 2022 [30]. The most recent PM2.5 data are available till Oct 2022 and used for the analysis. PurpleAir monitored PM2.5 was graphically and statistically evaluated for both FEM/FRM and non-FEM/FRM monitored PM2.5. All regulatory sites with PurpleAir monitor within 20 meters were analysed. Time-series and scatter plots are shown only for four FEM/FRM and four non-FEM/FRM sites were selected covering North to South of California for discussions.
Table 1. Statistical assessment of hourly average PurpleAir PM2.5 at selected sites for the years 2016 and 2022 at FEM/FRM and non-FEM/FRM sites.
Table 1. Statistical assessment of hourly average PurpleAir PM2.5 at selected sites for the years 2016 and 2022 at FEM/FRM and non-FEM/FRM sites.
Site Name and AQS ID, POC PurpleAir Sensor Index Distance, mts Dates Duration Num. of Paired Observation (#) R2 Mean Bias (µg/m3) RMSE
FEM/FRM
El Rio-Rio Mesa Schl. 061113001, 3 9594 0.18 4/2/18 to 8/31/22 30,798 0.50 3.5 7.85
Fresno-Garland 060190011, 3 2358 1.87 7/31/17 to 12/9/19 17,194 0.83 5.7 11.85
Goleta-Fairview 060832011, 1 16705 0 9/29/18 to 6/30/22 28,532 0.56 3.4 6.93
Lompoc 060832004, 1 16703 0 9/29/18 to 6/30/22 28,482 0.60 2.1 6.21
Non-FEM/FRM
Bakersfield 060290014, 3 2350 0.6 7/31/17 to 3/8/22 34,583 0.73 4.3 11.70
Calexico-Ethel Street 060250005, 3 1174 0.0 10/24/17 to 2/19/18 2,486 0.89 3.5 11.60
Sacramento-T Street 060670010, 3 8440 2.1 2/2/19 to 12/11/20 14,797 0.86 4.2 10.20
Riverside 060658001, 9 1854 8.5 7/10/17 to 12/31/19 14,604 0.69 5.7 9.70
Table 2. Sensor Data summary over different regions in California.
Table 2. Sensor Data summary over different regions in California.
PM2.5 (µg/m3) Bay Area Sac Metro San Diego SJV South Coast Bay Area Sac Metro San Diego SJV South Coast
2019 2018
Count 184,668 8,918 5,240 4,497 115,764 31,203 2,255 2,355 11,242 103,248
Maximum 201 101 217 155 185 251 300 73 277 306
Mean 7 10 11 13 13 15 23 13 20 15
Median 5 6 9 8 10 7 11 11 13 12
Std. Deviation 8 12 9 15 11 26 41 10 21 12
Standard Error 0.02 0.13 0.13 0.1 0.03 0.15 0.86 0.2 0.2 4
Q1 2 3 5 4 5 3 4 6 6 7
Q3 9 11 14 14 17 15 25 18 29 21
Inter Quartile Range 7 8 9 10 12 12 21 12 23 14
Figure 3 and Figure 4 show hourly average PM2.5, in black lines, at four FEM/FRM and four non-FEM/FRM sites and PurpleAir PM2.5 in purple dots. From these figures, it is very clear that PurpleAir monitors captured the trend of PM2.5 at regulatory monitors from 2016 to 2022. PurpleAir observed higher PM2.5 concentrations for both FEM/FRM and non-FEM/FRM regulatory monitors. They also captured the PM2.5 events due to forest fires along with regulatory monitors. PurpleAir PM2.5 followed the trends of regulatory monitors for both less than 100 µg/m3 and greater than 100 µg/m3 PM2.5 concentrations. PM2.5 above 200 µg/m3 were captured by PurpleAir at Fresno-Garland (Figure 3(c)) and all non-FEM/FRM sites with an exception of one day spike at El Rio-El Rio Mesa School (Figure 3(d)). Spikes in PM2.5 concentrations at Sacramento-T Street (Figure 4(c)) were observed due to forest fire and the trend can be seen by both regulatory and PurpleAir monitors. PM2.5 episodic events from windblown dust at Calexico-Ethel Street (Figure 4(a)) were also captured by both regulatory and PurpleAir monitors. Thus, the sensors have been able to capture local as well as regional episodic events.
Figure 5 shows scatter plots with hourly average PurpleAir PM2.5 concentrations on y-axis and regulatory monitored PM2.5 concentrations on x-axis. These plots show PurpleAir monitored higher concentrations than regulatory monitors for most of the times. Scatter plots also show +/- 25% dotted lines and for majority of times the scatter dots were out of +/- 25% range with higher number of dots towards y-axis or PurpleAir PM2.5. The linear fit line for all sites is on the positive side of +25%. Only El Rio School site (Figure 5(b)) site has shown one-to-one linear fit.
PurpleAir monitored PM2.5 were mostly higher than the regulatory monitored PM2.5. It may be due to since PurpleAir monitors were calibrated by the manufacturer using particles with properties completely different than particulate matter in the ambient air [31] and the conversion of particle counts to mass is also unknown [15]. Besides that, it was found that the ambient air also includes water droplets with aerodynamic particle size. Traditionally, both FEM/FRM and non-FEM/FRM monitors measure PM2.5 by removing water content in the sample inlet. This was achieved by heating the sample air in the inlet pipe. However, on contrary PurpleAir sensors measure PM2.5 concentrations without removing moisture content in aerosols. It is the water content in the ambient air that makes PM2.5 measured by PurpleAir as an “Absolute PM2.5” or with context to regulatory monitors as “Wet PM2.5”. The adjustment of water content in the PurpleAir measured PM2.5 during the conversion from particle count to mass is unknown. Therefore, even before the comparison between PurpleAir PM2.5 with FEM/FRM and non-FEM/FRM monitored PM2.5, the PurpleAir PM2.5 concentrations will be greater than regulatory monitors for most of the times.
Table 1 shows statistical evaluation of PurpleAir monitors in comparison with regulatory monitors. For statistical evaluation of the PM2.5 corelation coefficient (R2), mean bias (MB), and root mean square error (RMSE) were performed. Mean bias is primarily used to estimate the average bias between two variables. The coefficient of determination, R-squared (R2) determines how well data fit regression model to observation data. The Root Mean Square Error (RMSE) is a frequently used measure of the difference between two values. RMSE measures how much error there is between two variables. Equations of the evaluation indices are shown below:
M B = 1 n i = 1 n P i R i
R M S E = 1 n i = 1 n P i R i 2
R 2 = i = 1 n P i R i i = 1 n P i i = 1 n R i n i = 1 n P i 2 ( i = 1 n P i ) 2 n i = 1 n R i 2 ( i = 1 n R i ) 2 2
where, Pi is PurpleAir PM2.5 concentrations, Ri is regulatory PM2.5 concentrations, R   ¯ is mean of Ri, P   ¯ is mean of Pi, and n is the number of hourly samples.
For all FEM/FRM (Table 1 and Table S1), coefficient of determination, R2 values were between 0.23 and 0.9 with an average of 0.62 and for all non-FEM/FRM (Table 1 and Table S2), R2 values were between 0.27 and 0.92 with an average of 0.74 which were lower than reported studies conducted for shorter durations (SCAQMD, 2020; LRAPA, 2021; Gupta et al., 2018). The coefficient of determination, R2, of Goleta, El Rio-El Rio School, and Lompoc-H Street has shown lowest values of 0.56, 0.5, and 0.6 respectively. These three sites are along the coastlines of Southern California. It is expected that moisture content in the coastal air will be higher than the inland area. This affirms that moisture content plays a significant role in PurpleAir PM2.5 monitoring. Moisture in the air attracts PM due to its hygroscopic characteristics and results in presenting higher concentrations. As of now PurpleAir monitors do not heat inlet air compared to regulatory monitors. Rest of the sites, located inland, have shown higher R2 of greater than 0.70. The mean bias is highest at Fresno-Garland of 9.62 µg/m3 followed by 6.93 µg/m3 at Sacramento-T Street as shown in Supplement Tables S1 and S2. Mean bias for all sites were positive showing higher PM2.5 from PurpleAir than FEM/FRM and Non-FEM/FRM.
After validation of performance of purple air sensors with observed data the sensor data was used to perform detailed summary statistics across different regions of California Bay Area Air Quality Management District (AQMD) (Bay Area), Sacramento Metropolitan AQMD (Sac Metro), San Diego Air Pollution Control District (APCD) (San Diego), San Joaquin Valley APCD (SJV), and South Coast AQMD (South Coast) according to the availability of data from years 2016-2019. After excluding poorly performing sensors (around 4%) all the purple air sensors were used in this statistical analysis. Table 2 represents statistical data analysis for two more recent years 2018 and 2019.
A wide range of PM2.5 concentrations was seen across the sensor dataset with a maximum 24-hour average of around 300 μg/m3 measured in South Coast area near Los Angeles, with maximum concentrations in northern California showing impact of wildfires. Overall, the median PM2.5 concentration of the dataset was between 5-13 μg/m3 (interquartile range: 7 to 23 μg/m3, For the individual counties the standard deviation ranged from 8 to 40 μg/m3 across the entire state of California.
It is seen that in Bay Area the average daily mean across sensors varied between15 µg/m3 2018 to 7 µg/m3 in 2019. The standard deviation was also lower in 2019 (8 µg/m3) in comparison to values of 15 µg/m3 in 2018). The number of sensors has also increased significantly from around 369 in 2018 to around 773 sensors in 2019 in Bay Area, a growth of a huge 110 percent. The interquartile range also decreased in 2019 (around 6 µg/m3) significantly lower than that of 2018 (around 12 µg/m3). The other two areas Sac Metro and San Deigo exhibit a more modest growth of sensors (18 percent in San Diego and 94 percent in Sac Metro) in comparison to Bay Area. The southern part of California (South Coast) also has sensors over 450 in 2019.
Overall, the PM2.5 values exhibit less magnitude and variability over the state of California showing improving trends in PM2.5 concentrations. The wildfires were more intense in 2018 than in 2019 as seen in the maxima values and standard deviation values in Bay Area.
The ANOVA (Analysis of Variance) test was performed to determine whether daily mean PM2.5 levels as measured by the sensors across the state of California were identical to each other and show any significant difference amongst them during the study period (2017-2019). The ANOVA test indicated that there was a significant difference amongst the variances of the daily averages at the 95% significance level (p=0.00) and F statistic >0 for the different years and for the different regions. The ANOVA and Tukey test results from Bay Area have been displayed below for the years 2017, 2018 and 2019 as a representative result in Table 3.
Since the variances of the daily means were different, then the Post-Hoc test (Tukey Post Hoc test) was performed for multiple detailed comparisons. According to Tukey test, , the highest daily means of PM2.5 levels were observed in 2018 for Bay Area (Table 4). However, there was no significant differences in variances of daily mean concentrations for Bay Area between the years 2017 and 2018 (p>.0.05), although the differences were significant with respect to 2019 (Table 4). In case of San Diego San Diego differences in variances daily mean concentrations for all years were significant (p=0.00). For South Coast and SJV the highest daily means were observed in 2016. For South Coast there were no significant differences in variances of daily mean concentrations of PM2.5 between 2016 and 2018 ((p>.0.05), but significantly different from 2017 and 2019 (p=0.00). For SJV the differences in variances of daily means were minimal for the years 2016 and 2017 (p>0.05) and significant for other two years 2018 and 2019 (p=0.00). Therefore 2016, 2017 and 2018 may be considered the period of the highest daily PM2.5 concentrations measured in California.
These results for these other areas have been included in supplementary material. According to the ANOVA and muti -comparison Tukey test, the lowest daily mean PM2.5 levels was measured in 2019, disclosing a decreasing trend of daily PM2.5 concentrations in the study area. The results are also corroborated in Table 4.
A region wise inter-comparison (Bay Area, San Diego, San Mateo, SJV and South Coast) in State of California of daily means of sensor data for the year 2018 (California had the highest number of wildfires in that year) using ANOVA (Table 5) and Tukey’s multi comparison post hoc test (Table 6) revealed highest daily mean concentrations in Bay Area (probably due to forest fires) followed closely by South Coast which is located in more polluted due to proximity to Los Angeles. The differences between Bay Area and South Coast were not significant (p>0.05). However, the differences between these two regions and the others (San Mateo, San Diego and SJV) were significant (p=0.00) and F statistic >0.

3.2. Geostatistically Predicted and Observed PM2.5

The regulatory monitoring network are too sparse to support community-scale PM2.5 exposure assessments. PurpleAir monitoring network provides dense monitors up to community-scale and spatially across California State than the existing regulatory monitoring network. Geostatistical interpolation techniques: Kriging and IDW using PurpleAir PM2.5 might help to bridge the gap between PurpleAir and regulatory monitored PM2.5. Interpolation was done using daily average PurpleAir PM2.5 for the years 2018 and 2020 as the PurpleAir monitoring begin in California in 2016, and fewer monitors were in operation till the end of 2017. Figure 6 shows statistically interpolated PurpleAir, FEM/FRM, and Non-FEM/FRM daily average PM2.5 on November 16, 2018 by Kriging and IDW. Both statistical interpolation techniques have captured the smoke dispersion by CAMP fire started November 8, 2018 [32]. The difference in spatially interpolated daily average PurpleAir PM2.5 in northern part of California was due to difference in interpolation approaches by Kriging and IDW. The interpolated PM2.5 from PurpleAir has shown a better representation of PM2.5 due to dense number of PM2.5 monitors for interpolation in comparison to sparse network of FEM/FRM and Non-FEM/FRM monitors. For further analysis, four regulatory sites across California State without monitors were selected for its assessment. The reason of not selecting collocated monitored sites was to avoid the influence of monitored PurpleAir PM2.5 at the same location.
Figure 7 shows observed daily average PM2.5 concentrations in black line and interpolated PM2.5 concentrations at the four above mentioned regulatory monitoring sites. The time-series plots show good agreement between observed and interpolated PM2.5. Both IDW and Kriging methods captured the peaks of observed PM2.5. However, for many days Kriging and IDW over-predicted the PM2.5 as shown in Figure 8. The reason of the over prediction can be due to higher observed PM2.5 by PurpleAir monitors. Scatter plots with interpolated PM2.5 on y-axis and regulatory on x-axis show good agreement and most of the interpolated falls between +/- 25%. Both Kriging and IDW, geo-statistically techniques demonstrated that these can be used to interpolate daily average PurpleAir PM2.5 at un-monitored location for exposure and air quality assessments. The agreement between geo-statistically interpolated PurpleAir and observed daily average PM2.5 gives confidence in using PurpleAir PM2.5 with regulatory monitors to estimate PM2.5 at unmonitored location. This demonstrate low-cost PM2.5 sensors have a potential to fill in the gaps of regulatory monitoring networks and might be useful to overcome the limitations and improve the air quality assessments and other scientific assessments. These PurpleAir PM2.5 can be integrated and used with observed regulatory PM2.5 to formulate a decision support system using geostatistical techniques, but before that the uncertainty due to sensor measurements should be minimized prior to their usage to supplement regulatory monitors.
Table 7 shows statistical evaluation of interpolated daily averaged PurpleAir PM2.5, using Kriging and IDW techniques, with daily averaged observed PM2.5 concentrations. The interpolated PM2.5 by Kiging has lower Root Mean Square Error (RMSE) and Mean Bias (MB) values than IDW. Corelation co-efficient values were Oakland-West and Stockton-Hazelton sites were above 0.76 and lower for Mira Loma and Otay Mesa sites.

4. Conclusion

Recently emerged low-cost sensor-based monitoring technology has given a new dimension to air quality monitoring. Due to their portability and low-costs, sensors have made community based micro-environment monitoring of air pollutants possible by providing access to local community members and enabling them to be a part of the air quality monitoring process. Currently, PurpleAir monitoring network is the densest sensor based PM2.5 monitoring network existing on global scale. This sensor-based network has successfully achieved the objectives of educating the community about air pollution and helped in alerting the community for higher PM2.5 concentrations due to incidents like forest fire on account of its high density of air quality sensors. However, due to lack of best operational procedures, practices, and guidelines, this publicly available dataset cannot be used without QAQC for air quality and other scientific assessments. Evaluation of PurpleAir PM2.5 for California State conducted in this study included QAQC procedures, assessment with reference to monitored PM2.5 concentrations, and formulation of a decision support system integrating these sensor-based observations using geostatistical techniques.
The hourly and daily average observed PM2.5 concentrations from PurpleAir monitors generally followed the trends of observed PM2.5 levels at regulatory monitors. PurpleAir monitored PM2.5 also captured essential peaks of PM2.5 concentrations due to incidents like forest fire over the fire-year period. In comparison with reference monitored PM2.5 levels, it was found that PurpleAir PM2.5 concentrations were mostly higher. For longer-time periods the correlation coefficient R2 values were between 0.54 and 0.86 for selected collocated PurpleAir for both FEM/FRM and non-FEM/FRM monitors.
PurpleAir monitors can fill in a void of data representation of PM2.5 predictions on a localized scale. The methods of Kriging and IDW show similar patterns on spatial and temporal interpolation from PurpleAir PM2.5, but before that the uncertainty due to sensor measurements should be minimized prior to their usage to supplement regulatory monitors. Still, low-cost sensor-based monitors need to be integrated with regulatory monitors to provide higher spatio-temporal observed data for regulatory and policy purposes. They are great tools at local community levels to assess air quality and build awareness amongst citizens on risks of air pollution. This is evident in this study as seen in the substantial increase of sensors across state of California over the years. Although there is an overall decrease in PM2.5 concentrations, there are still problem areas due to wild fires in Northern California and local air pollution in Southern California which require further thinking and development of mitigation strategies to retrieve the situations. The high number of sensors would help in enhancing the spatial density of observations. Overall, this study revealed, that despite its shortcomings, low-cost PurpleAir sensor-based measurements could be an effective tool for ambient air quality monitoring. The efficacy of application of low-cost sensors in this study implies that sensor networks may be broadened worldwide especially in developing countries where there is a scarcity of regulatory air quality monitors to investigate high PM2.5 concentrations. This would entail in building a global roadmap for scientific community on usage of these sensors for air quality assessments and their subsequent impact on human health.

Author Contributions

Authors individual contributions are as: Conceptualization, methodology, Z. Farooqui and J. Biswas.; software, Z. Farooqui and J. Saha; validation, writing, review and editing J. Biswas; formal analysis, Z. Farooqui, J. Biswas, and J. Saha; data curation, visualization Z. Farooqui; writing—original draft preparation, Z. Farooqui.

Funding

The work was independent research, and no funding was received.

Data Availability Statement

PurpleAir and EPA AQS data are all publicly available data.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Pope, C.A., III; Ezzati, M.; Dockery, D.W. Fine-Particulate Air Pollution and Life Expectancy in the United States. N. Engl. J. Med. 2009, 360, 376–386. [Google Scholar] [CrossRef] [PubMed]
  2. Chen, X.-C.; Jahn, H.J.; Engling, G.; Ward, T.J.; Kraemer, A.; Ho, K.-F.; Yim, S.; Chan, C.-Y. Chemical characterization and sources of personal exposure to fine particulate matter (PM2.5) in the megacity of Guangzhou, China. Environ. Pollut. 2017, 231, 871–881. [Google Scholar] [CrossRef] [PubMed]
  3. Pozzer, A.; Bacer, S.; Sappadina, S.D.Z.; Predicatori, F.; Caleffi, A. Long-term concentrations of fine particulate matter and impact on human health in Verona, Italy. Atmos. Pollut. Res. 2019, 10, 731–738. [Google Scholar] [CrossRef]
  4. Pope, C.A., III; Dockery, D.W. Health Effects of Fine Particulate Air Pollution: Lines that Connect. J. Air Waste Manag. Assoc. 2006, 56, 709–742. [Google Scholar] [CrossRef] [PubMed]
  5. Shou, Y.; Huang, Y.; Zhu, X.; Liu, C.; Hu, Y.; Wang, H. A review of the possible associations between ambient PM2.5 exposures and the development of Alzheimer's disease. Ecotoxicol. Environ. Saf. 2019, 174, 344–352. [Google Scholar] [CrossRef] [PubMed]
  6. Ghosh, S.; Biswas, J.; Guttikunda, S.; Roychowdhury, S.; Nayak, M. An investigation of potential regional and local source regions affecting fine particulate matter concentrations in Delhi, India. J. Air Waste Manag. Assoc. 2014, 65, 218–231. [Google Scholar] [CrossRef]
  7. Pinto, J. P., Lefohn A. S., and Shadwick, D. S. Spatial Variability of PM2.5 in Urban Areas in the United States, J. of Air & Waste Manag. Assoc. 2004, 54:4, 440-449.
  8. Wang, Y., Li, J., Jing, H., Zhang, Q., Jiang, J., and Biswas, P. Laboratory Evaluation and Calibration of Three Low-Cost Particle Sensors for Particulate Matter Measurement, Aerosol Sci. Tech. 2015, 49, 1063–1077.
  9. U.S. Environmental Protection Agency (U.S. EPA): EPA scientists develop Federal Reference & Equivalent Methods for measuring key air pollutants, available at: https://www.epa.gov/airresearch/epa-scientists-develop-federal-reference-equivalentmethods-measuring-key-air (accessed on 10/09/2020a).
  10. Gupta, P.; Doraiswamy, P.; Levy, R.; Pikelnaya, O.; Maibach, J.; Feenstra, B.; Polidori, A.; Kiros, F.; Mills, K.C. Impact of California Fires on Local and Regional Air Quality: The Role of a Low-Cost Sensor Network and Satellite Observations. GeoHealth 2018, 2, 172–181. [Google Scholar] [CrossRef] [PubMed]
  11. Wallace, L. and Zhao, T., 2023. Spatial Variation of PM2.5 Indoors and Outdoors: Results from 261 Regulatory Monitors Compared to 14,000 Low-Cost Monitors in Three Western States over 4.7 Years. Sensors 2023, 23, 4387.
  12. Bi, J., Wildani. Incorporating low-cost sensor measurements into high-resolution PM2.5 modeling at a large spatial scale. Environ. Sci. Technol. 2020, 54, 2152–2162. [Google Scholar] [CrossRef]
  13. Morawska, L. , Thai, P. K., Liu, X., Asumadu-Sakyia, A., Ayokoa, G., Alena Bartonova, A., et al. Applications of low-cost sensing technologies for air quality monitoring and exposure assessment: How far have they gone? Environmental International 2018, 116, 286–299. [Google Scholar]
  14. Williams, R., Nash, D., Hagler, G., Benedict, K., MacGregor, I., Seay, B., Lawrence, M., and Dye, T. Peer Review and Supporting Literature Review of Air Sensor Technology Performance Targets, EPA Technical Report Undergoing Final External Peer Review 18, EPA/600/R-18/324, EPA, Washington, D.C. 20 September.
  15. Farooqui, M.Z., Biswas, J., Roychoudhry, S. and Ghosh, S. Evaluation of low-cost sensor for PM2.5 Assessment: A case study of California State. A&WMA’s 113th Virtual Annual Conference & Exhibition, San Francisco, California -July 2, 2020. 30 June.
  16. Gao, M.; Cao, J.; Seto, E. A distributed network of low-cost continuous reading sensors to measure spatiotemporal variations of PM2.5 in Xi'an, China. Environ. Pollut. 2015, 199, 56–65. [Google Scholar] [CrossRef]
  17. PurpleAir.
  18. Stavroulas, I.; Grivas, G.; Michalopoulos, P.; Liakakou, E.; Bougiatioti, A.; Kalkavouras, P.; Fameli, K.M.; Hatzianastassiou, N.; Mihalopoulos, N.; Gerasopoulos, E. Field Evaluation of Low-Cost PM Sensors (Purple Air PA-II) Under Variable Urban Air Quality Conditions, in Greece. Atmosphere 2020, 11, 926. [Google Scholar] [CrossRef]
  19. Mukherjee, A.; Brown, S.G.; McCarthy, M.C.; Pavlovic, N.R.; Stanton, L.G.; Snyder, J.L.; D’andrea, S.; Hafner, H.R. Measuring Spatial and Temporal PM2.5 Variations in Sacramento, California, Communities Using a Network of Low-Cost Sensors. Sensors 2019, 19, 4701. [Google Scholar] [CrossRef] [PubMed]
  20. Zikova, N.; Hopke, P.K.; Ferro, A.R. Evaluation of new low-cost particle monitors for PM2.5 concentrations measurements. J. Aerosol Sci. 2017, 105, 24–34. [Google Scholar] [CrossRef]
  21. Ardon-Dryer, K.; Dryer, Y.; Williams, J.N.; Moghimi, N. Measurements of PM2.5 with PurpleAir under atmospheric conditions. Atmos. Meas. Tech. 2020, 13, 5441–5458. [Google Scholar] [CrossRef]
  22. South Coast Air Quality Management District (SCAQMD). Air Quality Sensor Performance Evaluation Center (AQ-SPEC). http://www.aqmd.gov/aq-spec/evaluations/summary-pm (accessed on /09/2020).
  23. Robinson, D.L. Accurate, Low Cost PM2.5 Measurements Demonstrate the Large Spatial Variation in Wood Smoke Pollution in Regional Australia and Improve Modeling and Estimates of Health Costs. Atmosphere 2020, 11, 856. [Google Scholar] [CrossRef]
  24. Land Regional Air Protection Agency (LRAPA). PurpleAir Monitor Correction Factor History https://www.lrapa.org/DocumentCenter/View/4147/PurpleAir-Correction-Summary (accessed on 2/11/2021).
  25. Kuula, J.; Mäkelä, T.; Hillamo, R.; Timonen, H. Response Characterization of an Inexpensive Aerosol Sensor. Sensors 2017, 17, 2915. [Google Scholar] [CrossRef]
  26. Lu, G.Y.; Wong, D.W. An adaptive inverse-distance weighting spatial interpolation technique. Comput. Geosci. 2008, 34, 1044–1055. [Google Scholar] [CrossRef]
  27. Cressie, N., 1991. Statistics for spatial data. Wiley, New York. ISBN 0-471-00255-0.
  28. US Environmental Protection Agency: Air Quality System, https://www.epa.gov/aqs (accessed on 2/27/2020).
  29. Ryu, J.-S.; Kim, M.-S.; Cha, K.-J.; Lee, T.H.; Choi, D.-H. Kriging interpolation methods in geostatistics and DACE model. KSME Int. J. 2002, 16, 619–632. [Google Scholar] [CrossRef]
  30. U.S. Environmental Protection Agency (U.S. EPA): AQS Data Mart https://aqs.epa.gov/aqsweb/documents/data_mart_welcome.html (accessed on 6/22/2023).
  31. Badura, M., Batog, P., Drzeniecka-Osiadacz, A., Modzel, P. Evaluation of Low-Cost Sensors for Ambient PM2.5 Monitoring. J. of Sens. 2018, 5096540.
  32. California Fire. https://www.fire.ca.gov/incidents/2018/11/8/camp-fire (accessed on 6/5/2020).
Figure 1. PurpleAir and regulatory monitoring sites in California State.
Figure 1. PurpleAir and regulatory monitoring sites in California State.
Preprints 78232 g001
Figure 2. Flow chart of geostatistical interpolation of daily average PurpleAir PM2.5.
Figure 2. Flow chart of geostatistical interpolation of daily average PurpleAir PM2.5.
Preprints 78232 g002
Figure 3. Time-series plots of PM2.5 at FEM/FRM monitoring site with nearby PurpleAir monitors (a) Goleta (b) Lompoc-H St. (c) Fresno-Garland and (d) El Rio-El Rio Mesa School.
Figure 3. Time-series plots of PM2.5 at FEM/FRM monitoring site with nearby PurpleAir monitors (a) Goleta (b) Lompoc-H St. (c) Fresno-Garland and (d) El Rio-El Rio Mesa School.
Preprints 78232 g003
Figure 4. Time-series plots of PM2.5 at Non-FEM/FRM monitoring site with nearby PurpleAir monitors (a) Calexico-Ethel St. (b) Bakersfield-California Ave. (c) Sacramento-T St. and (d) Riverside-Rubidoux.
Figure 4. Time-series plots of PM2.5 at Non-FEM/FRM monitoring site with nearby PurpleAir monitors (a) Calexico-Ethel St. (b) Bakersfield-California Ave. (c) Sacramento-T St. and (d) Riverside-Rubidoux.
Preprints 78232 g004
Figure 5. Scatter plots of PM2.5at FEM/FRM monitoring site with nearby PurpleAir monitors (a) Goleta (b) Lompoc-H St. (c) Fresno-Garland and (d) El Rio-El Rio Mesa School Ethel St. (e) Calexico-Ethel St. (f) Bakersfield-California Ave. (g) Sacramento-T St. and (h) Riverside-Rubidoux.
Figure 5. Scatter plots of PM2.5at FEM/FRM monitoring site with nearby PurpleAir monitors (a) Goleta (b) Lompoc-H St. (c) Fresno-Garland and (d) El Rio-El Rio Mesa School Ethel St. (e) Calexico-Ethel St. (f) Bakersfield-California Ave. (g) Sacramento-T St. and (h) Riverside-Rubidoux.
Preprints 78232 g005
Figure 6. Statistically interpolated daily average PurpleAir PM2.5 across California State on November 16, 2018 by Kriging and IDW.
Figure 6. Statistically interpolated daily average PurpleAir PM2.5 across California State on November 16, 2018 by Kriging and IDW.
Preprints 78232 g006
Figure 7. Time-series plot of statistically predicted and observed PM2.5 concentrations at (a) Oakland-West, (b) Mira Loma, (c) Stockton-Hazelton, and (d) Otay Mesa.
Figure 7. Time-series plot of statistically predicted and observed PM2.5 concentrations at (a) Oakland-West, (b) Mira Loma, (c) Stockton-Hazelton, and (d) Otay Mesa.
Preprints 78232 g007
Table 3. ANOVA statistics year wise over Bay Area.
Table 3. ANOVA statistics year wise over Bay Area.
ANOVA
PM2.5 Sum of Squares df Mean Square F Sig.
Between Groups 1,739,781 2 869,891 5,510 0
Within Groups 34,588,639 219,077 158
Total 36,328,420 219,079
Table 4. Multi-comparison for different years using Tukey’s Test over Bay Area.
Table 4. Multi-comparison for different years using Tukey’s Test over Bay Area.
Year Mean Difference Std. Error Sigma 99% Confidence Interval
Lower Bound Upper Bound
2019 2018 -7.8 0.08 0.0 -8.0 -7.6
2017 -7.5 0.22 0.0 -8.1 -6.8
2018 2019 7.8 0.08 0.0 7.6 8.0
2017 0.3 0.23 0.3 -0.4 1.0
2017 2019 7.5 0.22 0.0 6.8 8.1
2018 -0.3 0.23 0.3 -1.0 0.4
Table 5. ANOVA statistical test region wise in California in 2018.
Table 5. ANOVA statistical test region wise in California in 2018.
ANOVA
PM2.5 Sum of Squares df Mean Square F Sig.
Between Groups 418,976 4 104,744 351 0
Within Groups 44,793,095 150,298 298
Total 45,212,071 150,302
Table 6. Mult-comparison region wise in California using Tukey’s Test for 2018.
Table 6. Mult-comparison region wise in California using Tukey’s Test for 2018.
Mean Difference Std. Error Sigma 99% Confidence Interval
Lower Bound Upper Bound
Bay Area SAC Metro -7.7 0.4 0.0 -9.0 -6.5
San Diego 2.1 0.4 0.0 0.9 3.3
SJV -5.3 0.2 0.0 -5.9 -4.6
South Coast 0.0 0.1 1.0 -0.4 0.3
SAC Metro Bay Area 7.7 0.4 0.0 6.5 9.0
San Diego 9.8 0.5 0.0 8.2 11.5
SJV 2.5 0.4 0.0 1.2 3.8
South Coast 7.7 0.4 0.0 6.5 8.9
San Diego Bay Area -2.1 0.4 0.0 -3.3 -0.9
SAC Metro -9.8 0.5 0.0 -11.5 -8.2
SJV -7.4 0.4 0.0 -8.6 -6.1
South Coast -2.1 0.4 0.0 -3.3 -1.0
SJV Bay Area 5.3 0.2 0.0 4.6 5.9
SAC Metro -2.5 0.4 0.0 -3.8 -1.2
San Diego 7.4 0.4 0.0 6.1 8.6
South Coast 5.2 0.2 0.0 4.7 5.8
South Coast Bay Area 0.0 0.1 1.0 -0.3 0.4
SAC Metro -7.7 0.4 0.0 -8.9 -6.5
San Diego 2.1 0.4 0.0 1.0 3.3
SJV -5.2 0.2 0.0 -5.8 -4.7
Table 7. Performance evaluation of statistically predicted and observed PM2.5 concentrations at selected sites in California.
Table 7. Performance evaluation of statistically predicted and observed PM2.5 concentrations at selected sites in California.
Oakland-West Stockton-Haz. Mira Loma Otay Mesa
IDW Kriging IDW Kriging IDW Kriging IDW Kriging
# Pair 1,084 1,084 1,079 1,079 1,083 1,083 1,052 1,052
Mean Bias (µg/m3) 2.48 1.30 3.46 0.78 4.53 1.04 2.89 2.35
RMSE 12.49 11.54 15.77 13.19 9.78 7.72 8.38 7.94
R2 0.82 0.83 0.79 0.77 0.69 0.63 0.59 0.50
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated