Submitted:
19 December 2024
Posted:
21 December 2024
Read the latest preprint version here
Abstract
Air pollution poses a significant health risk worldwide, with mortality rates from ambient particulate matter pollution increasing in many regions. This study focuses on forecasting air pollution-related mortality rates in two Central Asian cities, Bishkek (Kyrgyzstan) and Almaty (Kazakhstan). Utilizing time-series models, specifically Long Short-Term Memory (LSTM) networks and Prophet, the research aims to provide accurate predictions that can inform public health policies and interventions. The proposed methodology integrates advanced data preprocessing techniques, robust model architectures, and hyperparameter tuning to achieve an accuracy exceeding 85%. The findings reveal that time-series forecasting can effectively model the trend and seasonality of mortality rates, offering actionable insights for policymakers.
Keywords:
Introduction
Background
Problem Statement
Objectives
- Analyze historical mortality rates due to ambient particulate matter pollution in Bishkek and Almaty.
- Build and evaluate time-series forecasting models (LSTM and Prophet) to predict future trends.
- Achieve a forecasting accuracy of over 85%, providing reliable insights for policymakers.
Literature Review
Time-Series Analysis in Air Pollution Studies
Machine Learning in Forecasting
Research Gap
Methodology
Data Collection and Preprocessing
- Filtering Data: Extracting records for Kazakhstan and Kyrgyzstan.
- Handling Missing Values: Using linear interpolation to fill gaps.
- Normalization: Scaling data using MinMaxScaler for input to machine learning models.
Model Development
Long Short-Term Memory (LSTM)
- Input Layer: Processes sequences of scaled data.
- Hidden Layers: Two LSTM layers with 128 units each and dropout regularization (rate: 0.2).
- Output Layer: A dense layer with a ReLU activation function to predict mortality rates.
Prophet
- Yearly seasonality adjustment.
- Changepoint flexibility to capture abrupt shifts in trends.
Evaluation Metrics
- Root Mean Squared Error (RMSE): Measures the average magnitude of error.
- R-squared (¢): Assesses the goodness of fit.
- Mean Absolute Percentage Error (MAPE): Quantifies prediction accuracy as a percentage.
Results
Data Analysis
Model Performance
LSTM Model

- RMSE: 2.85
- R-squared: 89.2%
- MAPE: 6.7%
Prophet Model

- RMSE: 3.12
- R-squared: 86.5%
- MAPE: 8.1%
Visualization
- LSTM predictions closely align with actual values, showcasing minimal deviation.
- Prophet forecasts highlight seasonal and long-term trends, providing actionable insights.
Discussion
Implications
Limitations
- Limited granularity: Annual data may overlook short-term fluctuations.
- External factors: Variables like economic changes, healthcare improvements, and policy interventions were not included.
Future Work
- Incorporate additional features (e.g., meteorological data, industrial activity).
- Explore ensemble methods combining LSTM and Prophet.
- Develop real-time forecasting systems using streaming data.
Conclusions
References
- World Health Organization. (2021). Ambient air pollution: A global assessment of exposure and burden of disease.
- Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. [CrossRef]
- Taylor, S. J., & Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1), 37-45. [CrossRef]
- Global Burden of Disease Collaborative Network. (2020). Global burden of disease study 2019 (GBD 2019) results.
- Kaggle. (2023). Air Pollution Dataset. Available online: https://www.kaggle.com.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).