1. Introduction
Latent Tuberculosis Infection (LTBI) is a global public health concern, particularly in regions with high tuberculosis (TB) prevalence and populations vulnerable to progressing from latent to active TB [
1,
2]. In many cases, individuals with LTBI are asymptomatic, yet the infection can develop into active TB if left untreated, leading to significant morbidity and mortality [
3]. The World Health Organization (WHO) estimates that nearly one-quarter of the global population is infected with LTBI, with certain regions, particularly those with high HIV prevalence, being disproportionately affected [
4]. In South Africa, where HIV co-infection rates are among the highest in the world, LTBI poses a serious risk to population health and healthcare systems [
5]. Early detection and intervention are crucial in preventing the spread of TB and reducing the risk of progression to active disease. However, the silent nature of LTBI and the lack of widespread testing contribute to underdiagnoses [
6]. Understanding the key determinants influencing LTBI prevalence and identifying high-risk groups is essential for designing effective public health interventions. In rural areas of the Eastern Cape, socio-economic challenges, limited healthcare access, and low awareness about LTBI further exacerbate the problem, making intentional interventions a priority [
5]. The advancements in predictive modeling and machine learning offer valuable tools for identifying individuals at risk of LTBI [
7,
8,
9]. Logistic regression models have traditionally been used in epidemiological studies due to their interpretability and ability to quantify relationships between risk factors and health outcomes [
10,
11]. However, machine learning techniques, such as decision trees and random forests, are increasingly being applied in public health research to capture complex, non-linear interactions between variables, offering potentially higher predictive accuracy [
12,
13,
14].
This study aimed to develop predictive models to assess the likelihood of LTBI positivity based on demographic, health, and knowledge-related factors in rural areas of the Eastern Cape. We apply logistic regression, decision trees, and random forest models to evaluate their performance in predicting LTBI outcomes. Additionally, we use a knowledge diffusion model to explore strategies for improving LTBI awareness and testing rates. This research aims to provide actionable insights that can inform public health strategies, particularly in high-risk communities, and contribute to the broader effort to control TB in resource-limited settings.
2. Materials and Methods
2.1. Data Collection
Data was collected from a healthcare facility in rural areas in the Eastern Cape, focusing on demographic factors (age, gender, education, occupation), health status (HIV status, comorbidities), and survey responses related to LTBI awareness and testing. The dependent variable was LTBI test results (Positive/Negative) and the independent variables were age, gender, education, HIV status, comorbidities, and LTBI knowledge questions.
2.2. Logistic Regression Model
Logistic regression was used to estimate the likelihood of LTBI positivity, predicting LTBI outcomes based on demographic and health variables. Model performance was evaluated using accuracy, precision, recall, and F1-score. Odds ratios for each predictor were calculated to interpret their impact on LTBI positivity.
2.3. Machine Learning Models
Model comparison was done by the performance of decision trees and random forests that were evaluated to assess their suitability for predicting LTBI outcomes, particularly in identifying complex interactions between risk factors. Accuracy, precision, recall, and F1-score were calculated for each model. Feature importance was analyzed to understand the influence of key variables such as age, knowledge level, and occupation.
2.4. Data Analysis
We utilized STATA v15 to perform data cleaning and basic descriptive statistics. As part of the data cleaning process, we converted categorical data into numerical data by applying numerical value labels based on a pre-established codebook.
2.5. Prediction Tools and Software
Both R studio version 2022.02.3 Build 492 and R version 4.2.1 were used for creating machine learning classification algorithms. These softwares are freely available for data analytics. R is a statistical and data-centric programming language that is open-source, while R studio is an open-source integrated development environment (IDE) with an easy-to-use graphical user interface (UI). Additionally, R Studio offers a user-friendly graphical user interface for the R programming language that allows for point-and-click interactions.
2.6. Building the Machine Learning Algorithms
We utilized R-Studio and the "caret" library, a widely recognized R machine learning package. The dataset was divided into 80% for training and 20% for testing. Five algorithms, including support vector machines, AdaBoost, artificial neural networks, decision trees, and logistic regression, were constructed using the training dataset. Each algorithm underwent testing using the testing dataset. A 10-fold cross-validation was employed for model construction with the training dataset. The training dataset was split into 90% for training and 10% for testing, repeated 10 times before the final model was built. The final model was tested using 20% of the original dataset reserved for model testing. A confusion matrix was computed using the testing dataset to measure accuracy, positive predictive value, negative predictive value, sensitivity, and specificity for every machine-learning model, based on a 95% level of confidence.
2.7. Evaluation of the Developed Models
The model's performance was assessed using k-fold cross-validation. According to Trevor Hastie, cross-validation is a collection of techniques for evaluating a prediction model's effectiveness using fresh test data sets. Cross-validation approaches work by splitting the data into two sets: the training set, which is used to create the model, and the testing set, also known as the validation set, which is used to test the model by calculating the prediction error. Using the repeated k-fold cross-validation approach, we divided our dataset into k sets at random. We divided our data into tenfold equal datasets using this strategy. Nine-fold (90%) datasets were used to train the model, while the remaining one-fold (10%) dataset was utilized to assess the model's performance. After that, we assessed the created model using the test dataset (20%) to verify its correctness and validity in light of the observations that were not visible. Based on the findings, we calculated the prediction error as the mean squared difference between the values of the anticipated and actual outcomes.
2.8. Performance Measure of the Developed Model
There are several ways to measure how well machine learning models perform. These consist of the Receiver Operating Curve (ROC), accuracy, precision, F1 score, and recall. The number of positive and negative observations that the algorithm accurately classifies is known as accuracy. In a balanced classification task, when each class has equal importance to the researcher, accuracy is frequently employed. Furthermore, recall seeks to determine what percentage of true positives were accurately detected, whereas precision seeks to quantify the percentage of right identifications. The F1 score, on the other hand, takes the harmonic mean of recall and accuracy and merges them into a single statistic. Nonetheless, the majority of the time, the F1 score is utilized to determine the positive class.
2.9. Knowledge Diffusion Model
The model the spread of LTBI knowledge in the population, simulating transitions from being unaware to aware, and subsequently to testing. A compartmental model was adapted, using differential equations to track knowledge spread based on factors like education and barriers to action (e.g., financial constraints).
4. Discussion
Our study aimed to develop predictive models for LTBI outcomes using logistic regression, decision trees, and random forests. The key findings indicate that while logistic regression provided higher precision in predicting LTBI-positive cases, the random forest model demonstrated better overall accuracy and offered deeper insights into feature importance, particularly highlighting the role of demographic and knowledge-based factors. The model simulation further showed that targeted education campaigns led to a gradual increase in LTBI awareness and testing among high-risk groups, underscoring the positive impact of interventions. However, significant barriers were identified, including financial constraints and a lack of awareness, which hindered the progression from awareness to action. Addressing these barriers is crucial for improving LTBI testing and treatment rates. Despite its strong performance, the decision tree trailed the random forest by a small margin. With the lowest recall across the models, logistic regression missed a greater number of positive LTBI patients. The logistic regression model had the lowest recall, missing more positive LTBI cases. It has achieved 66.67% accuracy and 80% precision for positive cases. Top predictors included complete healthcare treatments, HIV status, and employment status, increasing the likelihood of testing positive. Individuals who completed full treatment for LTBI are 3.6 times more likely to test positive, while higher education reduces the odds. Negative predictors included responses to Q4_4B and Q8_8B (“LTBI is a contagious form of tuberculosis that can be easily transmitted to others through respiratory droplets, while active TB is not contagious and combination therapy with multiple antibiotics”), indicating protective behaviors. The logistic regression analysis showed that completing a full course of healthcare-prescribed treatments is the strongest positive predictor of LTBI positivity, possibly due to behavioral or knowledge-related factors. Conversely, contagious LTBI, despite not being contagious, had a negative influence. Positive predictors for LTBI positivity included Isoniazid monotherapy for 6-9 months and occupation, suggesting higher risk with certain knowledge and work-related exposure. Negative predictors included combination therapy with multiple antibiotics and high-dose antibiotics for short duration. Responses like "believe that LTBI treatment is necessary, even if you don't have symptoms" and "age" also increased positivity. Older individuals were less likely to test positive due to cohort exposure patterns or protective factors. Overall, these factors can influence LTBI positivity. The study revealed that beliefs about the necessity of LTBI treatment, even without symptoms, increase positivity, while older individuals are less likely to test positive due to age or protective factors. A decision tree model, starting with Q8 ("No treatment is necessary for LTBI"), categorizes responses based on feature values, outcomes, and final classifications, with respondents who believe treatment is unnecessary more likely to be LTBI-negative. The model predicted LTBI-positive individuals based on treatment concerns, age, and Q9 (“Are there any preventive measures individuals with LTBI should take to avoid developing active TB?”) responses. Older individuals and those with close contact with active TB were more likely to test positive. Factors like Q5_5B (“Close contact with someone with active TB”) increased the likelihood of LTBI-positive results, aligning with TB transmission risks. The tree ended with leaf nodes representing the final classification.
The model classified respondents based on their attitudes towards treatment for LTBI, with younger respondents classified as negative and those older, expressing treatment concerns, or with close TB contact as positive. The model also examined preventive measures, age, and contact with someone diagnosed with active TB, adjusting for positive results. The Decision Tree model predicted LTBI outcomes based on key factors such as age, complete treatment, attitudes, knowledge, and awareness. Older individuals were more likely to test positive, while those who complete a full course of treatments are more likely to be at higher risk. Other key features include belief in the necessity of treatment, completion of treatment, and completion of the entire course of medication. The random forest model achieved 59.26% accuracy, with age, knowledge, and occupation as top predictors. However, it struggled with recall for LTBI-positive cases. Q9_9D (“Completing a full course of treatments for LTBI as prescribed by a healthcare provider”) had the largest positive coefficient, increasing the likelihood of a positive LTBI result. The study found that increased knowledge about LTBI symptoms correlates with testing positive, suggesting the need for targeted awareness campaigns. However, protective behaviors and combination therapy with multiple antibiotics were found to reduce LTBI risk. Employment status was positively associated with LTBI risk, suggesting occupational exposure may play a role in controlling transmission. The models were evaluated and compared, and it was found that the random forest outperformed the decision tree in terms of overall accuracy and F1 score. The random forest and logistic regression models differ in their prediction of LTBI outcomes. Age was crucial in the random forest model due to demographic patterns, while age has a lower impact in logistic regression. Q9_9D (“Completing a full course of treatments for LTBI as prescribed by a healthcare provider”), the completion of full treatments, was the strongest predictor of LTBI positivity in logistic regression. The study revealed that certain factors, such as Q10_10A, Q8_8B, and Q14_14A (“Strongly agree, combination therapy with multiple antibiotics and Lack of awareness”), influence LTBI outcomes. Q10_10A (“Strongly agree”), which indicated strongly agreeing with treatment, was more likely to test positive in logistic regression. Q8_8B (“combination therapy with multiple antibiotics”), which predicts combination therapy with multiple antibiotics, is more predictive in logistic regression. Q14_14A (“Lack of awareness”), which indicates a lack of awareness, is also significant in random forest. The study analysed logistic regression, decision trees, and random forest models for predicting LTBI outcomes, revealing their strengths and weaknesses, and offering valuable insights for epidemiological understanding and public health interventions. Targeted interventions accelerated high-risk individuals transitions from unawareness to action, such as testing or treatment for LTBI, while the general population responds slower to broader awareness campaigns. Both groups transition more quickly, with the high-risk group responding faster. Targeted interventions and faster testing reduced high-risk individuals' unawareness and awareness, resulting in quicker action. The general population also showed quicker transitions, highlighting the effectiveness of targeted interventions. The analysis of logistic regression and random forest models revealed the importance of various predictors in determining LTBI positivity. LTBI knowledge, which involves completing prescribed treatments, is the most significant predictor in logistic regression, but its importance is lower in random forest. HIV status is also significant. The risk of LTBI in HIV-positive individuals was significantly higher, with logistic regression indicating a significant association. Occupation is more influential in the random forest model, highlighting the complexity of occupational exposure. Protective behaviors, such as practicing LTBI-contagious behaviors, have a negative coefficient in logistic regression, indicating different models treat predictors differently.
Interpretation of Model Performance
Logistic regression achieved an accuracy of 66.67%, with high precision (80%) for LTBI-positive cases. The strong interpretability of this model makes it valuable for public health applications, where understanding the relationship between specific risk factors (e.g., age, HIV status) and LTBI positivity is crucial for designing interventions. However, the recall of 33% suggests that logistic regression misses a significant number of true LTBI-positive cases, making it less suitable when detecting all positive cases is a priority.
The random forest model provided valuable insights into the most important features contributing to LTBI predictions. Age is the most important feature, with older individuals showing a higher likelihood of positive test results. Q9_ 9D (“Completing a full course of treatments for LTBI as prescribed by a healthcare provider”) is a significant survey response influencing the outcome. Q8 (“No treatment is necessary for LTBI”) is a key attitudinal factor contributing to the prediction of LTBI results. Q5 (“Close contact with someone with active TB”) is an important health-related factor. This analysis reveals that both demographic factors and specific survey responses play critical roles in predicting LTBI outcomes. The random forest model's superior performance can be attributed to its ensemble nature, which reduces overfitting by averaging the predictions of multiple decision trees. In contrast, the single decision tree model relied heavily on a few key splits, resulting in slightly lower accuracy and generalizability. The feature importance analysis underscores the significance of demographic factors like age, as well as attitudes and knowledge about LTBI. This insight suggests that public health interventions targeting older individuals or those with close TB contacts could be prioritized. The Random Forest model could assist in identifying individuals at higher risk for LTBI, helping prioritize testing and treatment. Awareness campaigns could focus on addressing misconceptions about LTBI treatment as indicated by Q8 (“What are the recommended treatments for LTBI?”) to improve treatment adherence.
The decision tree model had lower overall accuracy (55.56%), yet demonstrated better recall (42%) compared to logistic regression, which highlights its strength in identifying more true positives. However, its high number of false positives reduces its reliability for precise interventions. Random forest provided the best overall accuracy (59.26%) and F1-score (0.63), which suggests that it effectively balances precision and recall. This model's ability to handle complex interactions between demographic and health variables (e.g., age, occupation, and awareness of LTBI) makes it particularly useful for identifying nuanced patterns.
Latent tuberculosis infection and its association with various demographic and occupational factors, machine learning models, particularly decision trees and random forests, have identified age, knowledge of LTBI symptoms, and occupation as significant predictors of LTBI positivity. Studies have shown that older adults and individuals with lower awareness of LTBI symptoms are more likely to test positive for LTBI. This correlation is particularly pronounced in healthcare workers (HCWs), who often work in high-exposure environments. Age has been identified as a critical factor influencing LTBI risk. Research indicates that older individuals have a higher likelihood of LTBI positivity, which may be attributed to cumulative exposure over time and a potentially waning immune response to Mycobacterium tuberculosis [
15,
16]. Furthermore, knowledge about LTBI symptoms plays a pivotal role in the likelihood of testing positive. Individuals with limited awareness may not seek testing or treatment, thereby increasing their risk of harboring LTBI without appropriate intervention [
17]. Occupation is another significant determinant of LTBI risk, especially in high-exposure settings such as healthcare facilities. Studies have shown that HCWs are at an elevated risk for LTBI due to their frequent contact with TB patients [
18,
19]. The nature of their work often involves prolonged exposure to infectious agents, which significantly increases their likelihood of contracting LTBI compared to individuals in lower-risk occupations [
15,
18]. For instance, a systematic review highlighted those occupational factors, particularly those involving direct contact with TB patients, were significantly associated with LTBI among healthcare workers [
18,
19]. Moreover, the interplay between these factors suggests that targeted interventions could be beneficial. For example, enhancing awareness and education about LTBI symptoms among older adults and healthcare workers could lead to earlier detection and treatment, thereby reducing the overall burden of LTBI in these populations [
20]. Additionally, implementing regular screening protocols in high-exposure occupations could further mitigate the risk of LTBI transmission and progression to active TB disease [
21].
Feature Importance and Implications for LTBI Risk
The analysis of feature importance in predicting LTBI risk revealed consistent findings across both logistic regression and machine learning models. The logistic regression model identified age, HIV status, and responses to LTBI knowledge questions e.g., Q9_9D (“Completing a full course of treatments for LTBI as prescribed by a healthcare provider”) as the most significant predictors of LTBI positivity. This highlights that older individuals, those with HIV, and people with limited knowledge about LTBI are at a higher risk, emphasizing the need for targeted interventions focused on educating these vulnerable groups. Similarly, the random forest model confirmed age as the most influential factor, followed by responses to Q9_9D (“Completing a full course of treatments for LTBI as prescribed by a healthcare provider”) and occupation status. The alignment between the two models reinforces the critical role of demographic factors and LTBI knowledge in determining infection risk. These findings suggest that public health efforts should prioritize both demographic risk groups, such as older adults and high-risk occupations, and educational campaigns aimed at increasing awareness of LTBI symptoms and risks [
22,
23].
Discussion of the Knowledge Diffusion Model
The simulation outcomes from the knowledge diffusion model indicated that targeted interventions significantly increased awareness of LTBI, with awareness levels rising from 45% to 65% within six months. However, financial constraints and a lack of awareness presented by Q14 (“What barriers do you think may prevent individuals from seeking LTBI testing or treatment?) remained significant barriers, hindering individuals from progressing to the testing stage. Additionally, the impact of interventions showed that a 30% increase in education programs would lead to a 20% rise in LTBI testing among informed individuals, highlighting the importance of expanding educational outreach to improve testing rates. The machine learning models produced varied results in predicting LTBI. The decision tree model achieved an accuracy of 55.56% and an F1-score of 0.45, outperforming the logistic regression model in recall (42%) but underperforming in precision (50%). Key features influencing the decision tree model included responses to Q8_8C (“Isoniazid (INH) monotherapy for 6 to 9 months “) (0.84) and Q5 (“What are the risk factors for developing LTBI?”), which inquired about close contact with TB patients. These features played a significant role in predicting the likelihood of testing positive for LTBI. In comparison, the random forest model demonstrated superior performance with an accuracy of 59.26% and an F1-score of 0.63. While precision was improved (60%), the model struggled with recall for LTBI-positive cases, achieving only 25%. In terms of feature importance, the random forest model identified age as the most critical predictor, followed by responses to Q9_9D (“Completing a full course of treatments for LTBI as prescribed by a healthcare provider”) and occupation status, which significantly contributed to the overall predictions. The comparative analysis of the models revealed varying strengths and weaknesses in predicting LTBI. While the logistic regression model excelled in precision (80%), its recall was significantly lower (33%), indicating its limitations in identifying LTBI-positive cases. On the other hand, the random forest model offered the best balance between accuracy (59.26%) and F1-score (0.63), making it the most robust model for predicting LTBI outcomes overall. Each model displayed distinct strengths and weaknesses.
Logistic regression provided interpretable results and strong precision, making it particularly useful for public health interventions aimed at preventing false positives. However, its low recall limits its effectiveness in capturing a broader range of LTBI-positive cases. In contrast, the random forest model, although less interpretable, demonstrated greater robustness by handling complex interactions between demographic and health factors more effectively. The findings from the machine learning models have significant public health implications for improving LTBI detection and awareness. The strongest predictors of LTBI positivity across models included age, employment status, and low knowledge of LTBI symptoms. These results suggest that intervention programs should prioritize older populations, employed individuals, and those with limited awareness of LTBI, as they are at higher risk for testing positive. To enhance LTBI detection and awareness, targeted interventions are recommended. Specifically, educational campaigns that address knowledge gaps, such as those highlighted in Q9_9D (“Completing a full course of treatments for LTBI as prescribed by a healthcare provider”), and efforts to reduce financial barriers, as noted in Q14 (“What barriers do you think may prevent individuals from seeking LTBI testing or treatment?”), could substantially improve testing rates. Focusing on these high-risk groups and removing obstacles related to cost and awareness would lead to more effective public health outcomes and increased LTBI screening. These findings are consistent with other studies conducted elsewhere [
24,
25].
Diffusion model simulation that suggests targeted educational interventions can enhance awareness and testing rates for LTBI by up to 20% within six months. The study identifies key barriers to effective LTBI management, including financial constraints and a general lack of awareness among at-risk populations. The knowledge diffusion model employed in this study simulates the spread of information regarding LTBI symptoms, testing, and treatment options among various demographic groups. The model incorporates factors such as social networks, communication channels, and the influence of targeted educational campaigns. The simulation was designed to assess the impact of these interventions over six months. The findings from the simulation indicated that targeted educational interventions could lead to a 20% increase in LTBI awareness and testing rates within six months. This increase is attributed to the effective dissemination of information through community health workers, social media campaigns, and educational workshops tailored to specific populations, particularly those at higher risk for LTBI. The results underscore the importance of addressing barriers to LTBI awareness and testing. Key barriers identified include financial constraints; many individuals may not seek testing due to the costs associated with healthcare services, including consultations and diagnostic tests [
26,
27]. Providing financial assistance programs and insurance coverage for LTBI, testing could help overcome this obstacle. A significant portion of the population is unaware of LTBI and its implications due to lack of awareness. Educational interventions targeting high-risk groups, such as healthcare workers, immigrants from high-burden countries, and individuals with compromised immune systems, are crucial for increasing awareness [
28,
29,
30].
Public Health Implications
The public health implications of the findings emphasize the need for targeted interventions to address both high-risk groups and knowledge gaps. Given that age and employment status were significant predictors of LTBI positivity, public health campaigns should prioritize older adults and individuals working in high-exposure environments, such as healthcare and crowded workplaces. By focusing on targeted testing and awareness initiatives in these groups, the number of undiagnosed LTBI cases could be significantly reduced. Additionally, the strong association between LTBI knowledge, as reflected in responses to Q9_9D (“Completing a full course of treatments for LTBI as prescribed by a healthcare provider”), and test positivity suggests that increasing awareness about LTBI symptoms and risks is crucial. Educational programs tailored for communities with low awareness levels could assist in the early detection and treatment of infections, ultimately lowering overall infection rates. By addressing these knowledge gaps through targeted partnerships, we can significantly enhance public health efforts to control LTBI in underserved areas and improve overall knowledge.
Model Suitability and Practical Applications
While random forests provide greater accuracy and feature importance insights, logistic regression offers more interpretable results that are easier for policymakers to act upon. For public health interventions aimed at understanding risk factors and designing clear action steps, logistic regression is the preferred model despite its lower recall." The application of machine learning in public health, particularly through models like random forests, offers significant potential for large-scale population health monitoring, especially when it is crucial to capture complex interactions between variables. However, there is a tradeoff between model accuracy and interpretability that must be carefully weighed when using these models in public health decision-making. The analysis of key features revealed that positive coefficients, such as those related to occupation status (e.g., being employed), and responses to specific questions like Q9_9D (“Completing a full course of treatments for LTBI as prescribed by a healthcare provider”) and Q8_8C (What are the recommended treatments for LTBI?”), increase the likelihood of LTBI positivity. These factors suggest behaviors, conditions, or demographics associated with a higher risk of infection. Conversely, negative coefficients, such as age, Q4_4B (“LTBI is a contagious form of tuberculosis that can be easily transmitted to others through respiratory droplets, while active TB is not contagious”), and Q8_8A (“High-dose antibiotics for a short duration”), reduce the likelihood of LTBI positivity, potentially indicating protective factors or behaviors. Overall, the findings highlight the strong influence of occupation, age, and specific knowledge and behavior-related responses on LTBI outcomes, underscoring the importance of these factors in predicting infection risk and informing targeted interventions. The importance of targeted educational interventions in enhancing awareness and testing rates for LTBI has been underscored by various studies employing knowledge diffusion model simulations [
31,
32]. These simulations have demonstrated that such interventions can lead to significant improvements in public health outcomes, particularly in populations at risk for LTBI [
33,
34]. The findings indicate that educational initiatives can potentially increase awareness and testing rates by as much as 20% within six months. However, key barriers, including financial constraints and a general lack of awareness among the target populations [
35,
36], often hinder the implementation of these interventions. Knowledge diffusion models are theoretical frameworks that describe how information spreads within a population. These models can simulate the impact of educational interventions on awareness and behavior change regarding LTBI. For instance, a study by Hermes et al. [
15] utilized a knowledge diffusion model to assess the effects of targeted educational campaigns on LTBI awareness among healthcare workers and high-risk populations. The simulation results indicated that a well-structured educational intervention could lead to a 20% increase in awareness and testing rates within a six-month timeframe. Despite the potential benefits of educational interventions, several barriers impede their effectiveness. Financial constraints are a significant hurdle, particularly in low- and middle-income countries (LMICs) where healthcare resources are limited [
37,
38]. Many individuals may not have access to free or subsidized testing services, which can deter them from seeking LTBI screening [
16,
39]. Additionally, the lack of awareness about LTBI symptoms and the importance of testing contributes to low testing rates. Many individuals may not recognize the risk factors associated with LTBI or may not understand the implications of a positive test result [
17]. In Our study, we observed that when there is a slow awareness, the population takes more time to move from being unaware to taking action. By the end of the 12 months, only a moderate proportion of the population has taken steps like being tested or treated. This slow uptake suggests that extended periods of low awareness can hinder timely public health responses. A medium rate of awareness diffusion leads to quicker recognition of LTBI-related information, prompting a faster transition to action. A larger share of the population takes action within the same timeframe compared to the slow awareness. This emphasizes how even moderate improvements in awareness campaigns can lead to more effective health outcomes. Fast and rapid dissemination of awareness, driven by aggressive campaigns or interventions, leads to a swift and significant increase in the population that takes action. By the end of the 12 months, more of the population has been tested or treated compared to the slow and medium scenarios. This rapid response highlights the value of efficient public health strategies to raise awareness and prompt preventive actions quickly. To effectively increase LTBI awareness and testing rates, it is crucial to address these barriers through targeted interventions [
40,
41,
42]. Financial assistance programs, community outreach initiatives, and educational campaigns structured to specific demographics can help mitigate financial constraints and enhance awareness [
43,
44,
45]. For example, a study by Apriani et al. [
18] highlighted the effectiveness of community health worker-led educational sessions in increasing LTBI knowledge and testing rates among underserved populations.
Limitations of the Study
The random forest model showed improved accuracy, but its complexity limits interpretability, which is essential for decision-making in public health. Additionally, all models exhibited low recall for LTBI-positive cases, indicating the need to improve LTBI detection in future models. It is important to note that this study was conducted only in the Oliver Reginald (O.R.) Tambo District, and not all clinics in the municipality were included. Time and financial constraints limited the study to cover one clinic in this district. In the future, as finances and time allow, the study will be expanded to cover other areas, as LTBI affects various regions across all provinces of South Africa.
Recommendations & Future Work
These study findings recommend the development of targeted educational campaigns and increased LTBI testing in high-risk populations, particularly those who are unaware of the symptoms of LTBI. The findings of the study propose that future research should gather a larger amount of data from a more extensive population to enhance the dataset for other rural areas in South Africa. This will help in analyzing and identifying the key demographics, health, and knowledge-related factors that influence LTBI outcomes. In addition, the study recommends that the Department of Health and its healthcare providers in partnership with other stakeholders should strengthen educational programmes and awareness of LTBI knowledge, especially in all ages and disadvantaged populations living in congested settings.