Preprint Article Version 1 This version is not peer-reviewed

Causality-Driven Feature Selection for Calibrating Low-Cost Air Quality Sensors Using Machine Learning

Version 1 : Received: 8 October 2024 / Approved: 9 October 2024 / Online: 9 October 2024 (11:33:03 CEST)

How to cite: Sooriyaarachchi, V.; Lary, D. J.; Wijerante, L. O. H. Causality-Driven Feature Selection for Calibrating Low-Cost Air Quality Sensors Using Machine Learning. Preprints 2024, 2024100680. https://doi.org/10.20944/preprints202410.0680.v1 Sooriyaarachchi, V.; Lary, D. J.; Wijerante, L. O. H. Causality-Driven Feature Selection for Calibrating Low-Cost Air Quality Sensors Using Machine Learning. Preprints 2024, 2024100680. https://doi.org/10.20944/preprints202410.0680.v1

Abstract

With escalating global environmental challenges and worsening air quality, there is an urgent need for enhanced environmental monitoring capabilities. Low-cost sensor networks are emerging as a vital solution, enabling widespread and affordable deployment at fine spatial resolutions. In this context, machine learning for the calibration of low-cost sensors is particularly valuable. However, traditional machine learning models often lack interpretability and generalizability when applied to complex, dynamic environmental data. To address this, we propose a causal feature selection approach based on convergent cross-mapping within the machine learning pipeline to build more robustly calibrated sensor networks. This approach is applied in the calibration of low-cost optical particle counters, effectively reproducing the measurements of PM1 and PM2.5 as recorded by research grade spectrometers. We evaluated the predictive performance and generalizability of these causally optimized models, observing improvements in both while reducing the number of input features, thus adhering to the Occam’s razor principle. For the PM1 calibration model, the proposed feature selection reduced the mean squared error on the test set by 43.2% compared to the model with all input features, while the SHAP value-based selection only achieved a reduction of 29.6%. Similarly, for the PM2.5 model, the proposed feature selection led to a 33.2% reduction in the mean squared error, outperforming the 30.2% reduction achieved by SHAP value-based selection. By integrating sensors with advanced machine learning techniques, this approach advances urban air quality monitoring, fostering a deeper scientific understanding of microenvironments. Beyond the current test cases, this feature selection method holds potential for broader application in other environmental monitoring applications, contributing to the development of interpretable and robust environmental models.

Keywords

machine learning; causality; sensor calibration

Subject

Environmental and Earth Sciences, Atmospheric Science and Meteorology

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.