Preprint Article Version 1 This version is not peer-reviewed

Exploring Determinants and Predictive Models of Latent Tuberculosis Infection Outcomes in Rural Areas of the Eastern Cape: A Pilot Comparative Analysis of Logistic Regression and Machine Learning Approaches

Version 1 : Received: 28 October 2024 / Approved: 29 October 2024 / Online: 30 October 2024 (10:36:52 CET)

How to cite: Faye, L. M.; Magwaza, C.; Dlatu, N.; Apalata, T. Exploring Determinants and Predictive Models of Latent Tuberculosis Infection Outcomes in Rural Areas of the Eastern Cape: A Pilot Comparative Analysis of Logistic Regression and Machine Learning Approaches. Preprints 2024, 2024102346. https://doi.org/10.20944/preprints202410.2346.v1 Faye, L. M.; Magwaza, C.; Dlatu, N.; Apalata, T. Exploring Determinants and Predictive Models of Latent Tuberculosis Infection Outcomes in Rural Areas of the Eastern Cape: A Pilot Comparative Analysis of Logistic Regression and Machine Learning Approaches. Preprints 2024, 2024102346. https://doi.org/10.20944/preprints202410.2346.v1

Abstract

Latent Tuberculosis Infection (LTBI) poses a significant public health challenge, especially in populations with high HIV prevalence and limited healthcare access. Early detection and targeted interventions are essential to prevent the progression of active tuberculosis. This study develops predictive models for LTBI outcomes using logistic regression and machine learning approaches and evaluates strategies to improve LTBI awareness and testing. Data from rural areas in the Eastern Cape, South Africa, were analyzed to identify key demographic, health, and knowledge-related factors influencing LTBI outcomes. Logistic regression was employed to pre-dict LTBI positivity based on factors such as age, education, and HIV status. Machine learning models, including decision trees and random forests, were also applied to compare predictive accuracy. A knowledge diffusion model was used to assess the impact of educational interventions on increasing LTBI awareness and testing rates. Logistic regression achieved an accuracy of 66.67% with high precision (80%) but low recall (33%) for LTBI-positive cases, identifying age, HIV status, and LTBI awareness as significant predictors. The random forest model outperformed logistic regression in accuracy (59.26%) and F1-score (0.63), providing a better balance between precision and recall. Feature importance analysis revealed that age, occupation, and knowledge of LTBI symptoms were the most critical factors across both models. The knowledge diffusion model demonstrated that targeted interventions significantly increased LTBI awareness and testing, particularly in high-risk groups. While logistic regression offers more interpretable results for public health interventions, machine learning models like random forests provide enhanced predictive power by capturing complex relationships between demographics and health factors. These findings highlight the need for targeted educational campaigns and increased LTBI testing in high-risk populations, particularly those with limited awareness of LTBI symptoms.

Keywords

Latent Tuberculosis Infection; Logistic Regression; Machine Learning; Random Forest; Public Health; LTBI Awareness; Predictive Modeling

Subject

Public Health and Healthcare, Public Health and Health Services

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.