3.1. Observed PurpleAir and Regulatory PM2.5
It was imperative to evaluate and validate PurpleAir PM
2.5 with observed PM
2.5 at regulatory sites.
Figure 1 also shows FEM/FRM and non-FEM/FRM monitored PM
2.5 sites in California during 2016 and 2023. Details of these sites are in
Table 1, , with AQS ID, site name, PurpleAir monitor ID, and dates of monitoring. It also shows approximate distance calculated between PurpleAir monitor and regulatory site. Regulatory sites data were downloaded from EPA AQS Datamart from 2016 to 2022 [
30]. The most recent PM
2.5 data are available till Oct 2022 and used for the analysis. PurpleAir monitored PM
2.5 was graphically and statistically evaluated for both FEM/FRM and non-FEM/FRM monitored PM
2.5. All regulatory sites with PurpleAir monitor within 20 meters were analysed. Time-series and scatter plots are shown only for four FEM/FRM and four non-FEM/FRM sites were selected covering North to South of California for discussions.
Table 1.
Statistical assessment of hourly average PurpleAir PM2.5 at selected sites for the years 2016 and 2022 at FEM/FRM and non-FEM/FRM sites.
Table 1.
Statistical assessment of hourly average PurpleAir PM2.5 at selected sites for the years 2016 and 2022 at FEM/FRM and non-FEM/FRM sites.
Site Name and AQS ID, POC |
PurpleAir Sensor Index |
Distance, mts |
Dates Duration |
Num. of Paired Observation (#) |
R2
|
Mean Bias (µg/m3) |
RMSE |
FEM/FRM |
El Rio-Rio Mesa Schl. 061113001, 3 |
9594 |
0.18 |
4/2/18 to 8/31/22 |
30,798 |
0.50 |
3.5 |
7.85 |
Fresno-Garland 060190011, 3 |
2358 |
1.87 |
7/31/17 to 12/9/19 |
17,194 |
0.83 |
5.7 |
11.85 |
Goleta-Fairview 060832011, 1 |
16705 |
0 |
9/29/18 to 6/30/22 |
28,532 |
0.56 |
3.4 |
6.93 |
Lompoc 060832004, 1 |
16703 |
0 |
9/29/18 to 6/30/22 |
28,482 |
0.60 |
2.1 |
6.21 |
Non-FEM/FRM |
Bakersfield 060290014, 3 |
2350 |
0.6 |
7/31/17 to 3/8/22 |
34,583 |
0.73 |
4.3 |
11.70 |
Calexico-Ethel Street 060250005, 3 |
1174 |
0.0 |
10/24/17 to 2/19/18 |
2,486 |
0.89 |
3.5 |
11.60 |
Sacramento-T Street 060670010, 3 |
8440 |
2.1 |
2/2/19 to 12/11/20 |
14,797 |
0.86 |
4.2 |
10.20 |
Riverside 060658001, 9 |
1854 |
8.5 |
7/10/17 to 12/31/19 |
14,604 |
0.69 |
5.7 |
9.70 |
Table 2.
Sensor Data summary over different regions in California.
Table 2.
Sensor Data summary over different regions in California.
PM2.5 (µg/m3) |
Bay Area |
Sac Metro |
San Diego |
SJV |
South Coast |
Bay Area |
Sac Metro |
San Diego |
SJV |
South Coast |
|
2019 |
|
2018 |
Count |
184,668 |
8,918 |
5,240 |
4,497 |
115,764 |
31,203 |
2,255 |
2,355 |
11,242 |
103,248 |
Maximum |
201 |
101 |
217 |
155 |
185 |
251 |
300 |
73 |
277 |
306 |
Mean |
7 |
10 |
11 |
13 |
13 |
15 |
23 |
13 |
20 |
15 |
Median |
5 |
6 |
9 |
8 |
10 |
7 |
11 |
11 |
13 |
12 |
Std. Deviation |
8 |
12 |
9 |
15 |
11 |
26 |
41 |
10 |
21 |
12 |
Standard Error |
0.02 |
0.13 |
0.13 |
0.1 |
0.03 |
0.15 |
0.86 |
0.2 |
0.2 |
4 |
Q1 |
2 |
3 |
5 |
4 |
5 |
3 |
4 |
6 |
6 |
7 |
Q3 |
9 |
11 |
14 |
14 |
17 |
15 |
25 |
18 |
29 |
21 |
Inter Quartile Range |
7 |
8 |
9 |
10 |
12 |
12 |
21 |
12 |
23 |
14 |
Figure 3 and
Figure 4 show hourly average PM
2.5, in black lines, at four FEM/FRM and four non-FEM/FRM sites and PurpleAir PM
2.5 in purple dots. From these figures, it is very clear that PurpleAir monitors captured the trend of PM
2.5 at regulatory monitors from 2016 to 2022. PurpleAir observed higher PM
2.5 concentrations for both FEM/FRM and non-FEM/FRM regulatory monitors. They also captured the PM
2.5 events due to forest fires along with regulatory monitors. PurpleAir PM
2.5 followed the trends of regulatory monitors for both less than 100 µg/m
3 and greater than 100 µg/m
3 PM
2.5 concentrations. PM
2.5 above 200 µg/m
3 were captured by PurpleAir at Fresno-Garland (
Figure 3(c)) and all non-FEM/FRM sites with an exception of one day spike at El Rio-El Rio Mesa School (
Figure 3(d)). Spikes in PM
2.5 concentrations at Sacramento-T Street (
Figure 4(c)) were observed due to forest fire and the trend can be seen by both regulatory and PurpleAir monitors. PM
2.5 episodic events from windblown dust at Calexico-Ethel Street (
Figure 4(a)) were also captured by both regulatory and PurpleAir monitors. Thus, the sensors have been able to capture local as well as regional episodic events.
Figure 5 shows scatter plots with hourly average PurpleAir PM
2.5 concentrations on y-axis and regulatory monitored PM
2.5 concentrations on x-axis. These plots show PurpleAir monitored higher concentrations than regulatory monitors for most of the times. Scatter plots also show +/- 25% dotted lines and for majority of times the scatter dots were out of +/- 25% range with higher number of dots towards y-axis or PurpleAir PM
2.5. The linear fit line for all sites is on the positive side of +25%. Only El Rio School site (
Figure 5(b)) site has shown one-to-one linear fit.
PurpleAir monitored PM
2.5 were mostly higher than the regulatory monitored PM
2.5. It may be due to since PurpleAir monitors were calibrated by the manufacturer using particles with properties completely different than particulate matter in the ambient air [
31] and the conversion of particle counts to mass is also unknown [
15]. Besides that, it was found that the ambient air also includes water droplets with aerodynamic particle size. Traditionally, both FEM/FRM and non-FEM/FRM monitors measure PM
2.5 by removing water content in the sample inlet. This was achieved by heating the sample air in the inlet pipe. However, on contrary PurpleAir sensors measure PM
2.5 concentrations without removing moisture content in aerosols. It is the water content in the ambient air that makes PM
2.5 measured by PurpleAir as an “Absolute PM
2.5” or with context to regulatory monitors as “Wet PM
2.5”. The adjustment of water content in the PurpleAir measured PM
2.5 during the conversion from particle count to mass is unknown. Therefore, even before the comparison between PurpleAir PM
2.5 with FEM/FRM and non-FEM/FRM monitored PM
2.5, the PurpleAir PM
2.5 concentrations will be greater than regulatory monitors for most of the times.
Table 1 shows statistical evaluation of PurpleAir monitors in comparison with regulatory monitors. For statistical evaluation of the PM
2.5 corelation coefficient (R
2), mean bias (MB), and root mean square error (RMSE) were performed. Mean bias is primarily used to estimate the average bias between two variables. The coefficient of determination, R-squared (R
2) determines how well data fit regression model to observation data. The Root Mean Square Error (RMSE) is a frequently used measure of the difference between two values. RMSE measures how much error there is between two variables. Equations of the evaluation indices are shown below:
where, Pi is PurpleAir PM
2.5 concentrations, Ri is regulatory PM
2.5 concentrations,
is mean of Ri,
is mean of Pi, and n is the number of hourly samples.
For all FEM/FRM (
Table 1 and Table S1), coefficient of determination, R
2 values were between 0.23 and 0.9 with an average of 0.62 and for all non-FEM/FRM (
Table 1 and Table S2), R
2 values were between 0.27 and 0.92 with an average of 0.74 which were lower than reported studies conducted for shorter durations (SCAQMD, 2020; LRAPA, 2021; Gupta et al., 2018). The coefficient of determination, R
2, of Goleta, El Rio-El Rio School, and Lompoc-H Street has shown lowest values of 0.56, 0.5, and 0.6 respectively. These three sites are along the coastlines of Southern California. It is expected that moisture content in the coastal air will be higher than the inland area. This affirms that moisture content plays a significant role in PurpleAir PM
2.5 monitoring. Moisture in the air attracts PM due to its hygroscopic characteristics and results in presenting higher concentrations. As of now PurpleAir monitors do not heat inlet air compared to regulatory monitors. Rest of the sites, located inland, have shown higher R
2 of greater than 0.70. The mean bias is highest at Fresno-Garland of 9.62 µg/m
3 followed by 6.93 µg/m
3 at Sacramento-T Street as shown in Supplement Tables S1 and S2. Mean bias for all sites were positive showing higher PM
2.5 from PurpleAir than FEM/FRM and Non-FEM/FRM.
After validation of performance of purple air sensors with observed data the sensor data was used to perform detailed summary statistics across different regions of California Bay Area Air Quality Management District (AQMD) (Bay Area), Sacramento Metropolitan AQMD (Sac Metro), San Diego Air Pollution Control District (APCD) (San Diego), San Joaquin Valley APCD (SJV), and South Coast AQMD (South Coast) according to the availability of data from years 2016-2019. After excluding poorly performing sensors (around 4%) all the purple air sensors were used in this statistical analysis.
Table 2 represents statistical data analysis for two more recent years 2018 and 2019.
A wide range of PM2.5 concentrations was seen across the sensor dataset with a maximum 24-hour average of around 300 μg/m3 measured in South Coast area near Los Angeles, with maximum concentrations in northern California showing impact of wildfires. Overall, the median PM2.5 concentration of the dataset was between 5-13 μg/m3 (interquartile range: 7 to 23 μg/m3, For the individual counties the standard deviation ranged from 8 to 40 μg/m3 across the entire state of California.
It is seen that in Bay Area the average daily mean across sensors varied between15 µg/m3 2018 to 7 µg/m3 in 2019. The standard deviation was also lower in 2019 (8 µg/m3) in comparison to values of 15 µg/m3 in 2018). The number of sensors has also increased significantly from around 369 in 2018 to around 773 sensors in 2019 in Bay Area, a growth of a huge 110 percent. The interquartile range also decreased in 2019 (around 6 µg/m3) significantly lower than that of 2018 (around 12 µg/m3). The other two areas Sac Metro and San Deigo exhibit a more modest growth of sensors (18 percent in San Diego and 94 percent in Sac Metro) in comparison to Bay Area. The southern part of California (South Coast) also has sensors over 450 in 2019.
Overall, the PM2.5 values exhibit less magnitude and variability over the state of California showing improving trends in PM2.5 concentrations. The wildfires were more intense in 2018 than in 2019 as seen in the maxima values and standard deviation values in Bay Area.
The ANOVA (Analysis of Variance) test was performed to determine whether daily mean PM
2.5 levels as measured by the sensors across the state of California were identical to each other and show any significant difference amongst them during the study period (2017-2019). The ANOVA test indicated that there was a significant difference amongst the variances of the daily averages at the 95% significance level (p=0.00) and F statistic >0 for the different years and for the different regions. The ANOVA and Tukey test results from Bay Area have been displayed below for the years 2017, 2018 and 2019 as a representative result in
Table 3.
Since the variances of the daily means were different, then the Post-Hoc test (Tukey Post Hoc test) was performed for multiple detailed comparisons. According to Tukey test, , the highest daily means of PM
2.5 levels were observed in 2018 for Bay Area (
Table 4). However, there was no significant differences in variances of daily mean concentrations for Bay Area between the years 2017 and 2018 (p>.0.05), although the differences were significant with respect to 2019 (
Table 4). In case of San Diego San Diego differences in variances daily mean concentrations for all years were significant (p=0.00). For South Coast and SJV the highest daily means were observed in 2016. For South Coast there were no significant differences in variances of daily mean concentrations of PM
2.5 between 2016 and 2018 ((p>.0.05), but significantly different from 2017 and 2019 (p=0.00). For SJV the differences in variances of daily means were minimal for the years 2016 and 2017 (p>0.05) and significant for other two years 2018 and 2019 (p=0.00). Therefore 2016, 2017 and 2018 may be considered the period of the highest daily PM
2.5 concentrations measured in California.
These results for these other areas have been included in supplementary material. According to the ANOVA and muti -comparison Tukey test, the lowest daily mean PM
2.5 levels was measured in 2019, disclosing a decreasing trend of daily PM
2.5 concentrations in the study area. The results are also corroborated in
Table 4.
A region wise inter-comparison (Bay Area, San Diego, San Mateo, SJV and South Coast) in State of California of daily means of sensor data for the year 2018 (California had the highest number of wildfires in that year) using ANOVA (
Table 5) and Tukey’s multi comparison post hoc test (
Table 6) revealed highest daily mean concentrations in Bay Area (probably due to forest fires) followed closely by South Coast which is located in more polluted due to proximity to Los Angeles. The differences between Bay Area and South Coast were not significant (p>0.05). However, the differences between these two regions and the others (San Mateo, San Diego and SJV) were significant (p=0.00) and F statistic >0.
3.2. Geostatistically Predicted and Observed PM2.5
The regulatory monitoring network are too sparse to support community-scale PM
2.5 exposure assessments. PurpleAir monitoring network provides dense monitors up to community-scale and spatially across California State than the existing regulatory monitoring network. Geostatistical interpolation techniques: Kriging and IDW using PurpleAir PM
2.5 might help to bridge the gap between PurpleAir and regulatory monitored PM
2.5. Interpolation was done using daily average PurpleAir PM
2.5 for the years 2018 and 2020 as the PurpleAir monitoring begin in California in 2016, and fewer monitors were in operation till the end of 2017.
Figure 6 shows statistically interpolated PurpleAir, FEM/FRM, and Non-FEM/FRM daily average PM
2.5 on November 16, 2018 by Kriging and IDW. Both statistical interpolation techniques have captured the smoke dispersion by CAMP fire started November 8, 2018 [
32]. The difference in spatially interpolated daily average PurpleAir PM
2.5 in northern part of California was due to difference in interpolation approaches by Kriging and IDW. The interpolated PM
2.5 from PurpleAir has shown a better representation of PM
2.5 due to dense number of PM
2.5 monitors for interpolation in comparison to sparse network of FEM/FRM and Non-FEM/FRM monitors. For further analysis, four regulatory sites across California State without monitors were selected for its assessment. The reason of not selecting collocated monitored sites was to avoid the influence of monitored PurpleAir PM
2.5 at the same location.
Figure 7 shows observed daily average PM
2.5 concentrations in black line and interpolated PM
2.5 concentrations at the four above mentioned regulatory monitoring sites. The time-series plots show good agreement between observed and interpolated PM
2.5. Both IDW and Kriging methods captured the peaks of observed PM
2.5. However, for many days Kriging and IDW over-predicted the PM
2.5 as shown in Figure 8. The reason of the over prediction can be due to higher observed PM
2.5 by PurpleAir monitors. Scatter plots with interpolated PM
2.5 on y-axis and regulatory on x-axis show good agreement and most of the interpolated falls between +/- 25%. Both Kriging and IDW, geo-statistically techniques demonstrated that these can be used to interpolate daily average PurpleAir PM
2.5 at un-monitored location for exposure and air quality assessments. The agreement between geo-statistically interpolated PurpleAir and observed daily average PM
2.5 gives confidence in using PurpleAir PM
2.5 with regulatory monitors to estimate PM
2.5 at unmonitored location. This demonstrate low-cost PM
2.5 sensors have a potential to fill in the gaps of regulatory monitoring networks and might be useful to overcome the limitations and improve the air quality assessments and other scientific assessments. These PurpleAir PM
2.5 can be integrated and used with observed regulatory PM
2.5 to formulate a decision support system using geostatistical techniques, but before that the uncertainty due to sensor measurements should be minimized prior to their usage to supplement regulatory monitors.
Table 7 shows statistical evaluation of interpolated daily averaged PurpleAir PM2.5, using Kriging and IDW techniques, with daily averaged observed PM2.5 concentrations. The interpolated PM2.5 by Kiging has lower Root Mean Square Error (RMSE) and Mean Bias (MB) values than IDW. Corelation co-efficient values were Oakland-West and Stockton-Hazelton sites were above 0.76 and lower for Mira Loma and Otay Mesa sites.