Version 1
: Received: 8 October 2024 / Approved: 9 October 2024 / Online: 9 October 2024 (11:33:03 CEST)
How to cite:
Sooriyaarachchi, V.; Lary, D. J.; Wijerante, L. O. H. Causality-Driven Feature Selection for Calibrating Low-Cost Air Quality Sensors Using Machine Learning. Preprints2024, 2024100680. https://doi.org/10.20944/preprints202410.0680.v1
Sooriyaarachchi, V.; Lary, D. J.; Wijerante, L. O. H. Causality-Driven Feature Selection for Calibrating Low-Cost Air Quality Sensors Using Machine Learning. Preprints 2024, 2024100680. https://doi.org/10.20944/preprints202410.0680.v1
Sooriyaarachchi, V.; Lary, D. J.; Wijerante, L. O. H. Causality-Driven Feature Selection for Calibrating Low-Cost Air Quality Sensors Using Machine Learning. Preprints2024, 2024100680. https://doi.org/10.20944/preprints202410.0680.v1
APA Style
Sooriyaarachchi, V., Lary, D. J., & Wijerante, L. O. H. (2024). Causality-Driven Feature Selection for Calibrating Low-Cost Air Quality Sensors Using Machine Learning. Preprints. https://doi.org/10.20944/preprints202410.0680.v1
Chicago/Turabian Style
Sooriyaarachchi, V., David J. Lary and Lakitha Omal Harindha Wijerante. 2024 "Causality-Driven Feature Selection for Calibrating Low-Cost Air Quality Sensors Using Machine Learning" Preprints. https://doi.org/10.20944/preprints202410.0680.v1
Abstract
With escalating global environmental challenges and worsening air quality, there is an urgent need for enhanced environmental monitoring capabilities. Low-cost sensor networks are emerging as a vital solution, enabling widespread and affordable deployment at fine spatial resolutions. In this context, machine learning for the calibration of low-cost sensors is particularly valuable. However, traditional machine learning models often lack interpretability and generalizability when applied to complex, dynamic environmental data. To address this, we propose a causal feature selection approach based on convergent cross-mapping within the machine learning pipeline to build more robustly calibrated sensor networks. This approach is applied in the calibration of low-cost optical particle counters, effectively reproducing the measurements of PM1 and PM2.5 as recorded by research grade spectrometers. We evaluated the predictive performance and generalizability of these causally optimized models, observing improvements in both while reducing the number of input features, thus adhering to the Occam’s razor principle. For the PM1 calibration model, the proposed feature selection reduced the mean squared error on the test set by 43.2% compared to the model with all input features, while the SHAP value-based selection only achieved a reduction of 29.6%. Similarly, for the PM2.5 model, the proposed feature selection led to a 33.2% reduction in the mean squared error, outperforming the 30.2% reduction achieved by SHAP value-based selection. By integrating sensors with advanced machine learning techniques, this approach advances urban air quality monitoring, fostering a deeper scientific understanding of microenvironments. Beyond the current test cases, this feature selection method holds potential for broader application in other environmental monitoring applications, contributing to the development of interpretable and robust environmental models.
Keywords
machine learning; causality; sensor calibration
Subject
Environmental and Earth Sciences, Atmospheric Science and Meteorology
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.