3.1. MWD Exploratory Data Analysis
Initially, a statistical analysis was conducted for the
rop, tor, fob and
bap MWD variables collected by sensors on the drill rigs in both
BR and
MM pits. The
rop followed a right-skewed distribution in both
BR and
MM (
Figure 2a and
Figure 3a). The similar skewness in both Formations suggests variable penetration resistance, likely due to bit wear, operator variability, and changing rock properties along the borehole.
tor follows a normal distribution in both the normal distributions, indicating consistent torque requirements, likely because torque is automatically regulated based on drilling conditions, resulting in a more stable response (
Figure 2b and
Figure 3b)
Key differences in
MWD variables emerged between the
BR and
MM Formations in the
fob and
bap distributions. In the
BR Formation,
fob (
Figure 2c) and
bap (
Figure 2d) followed right-skewed distributions, suggests the presence of harder, banded lithologies interbedded with softer sediments, requiring variable forces for penetration. In contrast, the
fob and
bap (
Figure 3c and 3d, respectively), in the
MM Formation followed normal distributions indicating a more homogeneous rock type with less variability in hardness. The
MM Formation, deposited in a more stable sedimentary environment, lacks the extensive structural banding and alternating hardness levels of the
BR Formation, leading to a more consistent force requirement during drilling.
These findings underscore the importance of considering geological context when analyzing MWD data, as linear statistical assumptions may not capture the complex interactions between rock mass properties and drilling responses. Moreover, univariate examination of these variables may not describe non-linear relationships between the MWD responses and the geotechnical outputs.
Partial dependence plots provide a powerful tool for analyzing the relationship between MWD variables and target geotechnical properties across different deposit types. By isolating the effect of individual MWD parameters while averaging out other influences, the plots reveal how variables such as rop, fob, tor and bap contribute to predicting UCS, FPM, and GSI. This study applies partial dependence to compare these relationships between BR and MM formations, highlighting formation-specific differences in mechanical resistance, fracturing behavior, and rock mass quality. Understanding these dependencies improves the interpretability of MWD-based geotechnical models, supporting formation-specific feature selection and predictive accuracy.
Partial dependence analysis reveals key
UCS differences between
BR (
Figure 4) and
MM (
Figure 5). In
BR,
UCS is driven by mechanical resistance, with
fob and
tor as strong predictors, reflecting higher rock competency. In
MM, these relationships are weaker, suggesting fracturing and mineralogy play a greater role.
rop inversely correlates with
UCS in
BR, indicating harder rock slows drilling, whereas in
MM,
rop is a weaker predictor, implying structural factors govern penetration speed.
bap increases
UCS in
BR but declines at high values, suggesting flushing inefficiencies, while in
MM,
bap stabilizes earlier, indicating faster drilling optimization. These findings highlight formation-specific
UCS controls, with
BR dominated by mechanical resistance and
MM influenced by additional geological factors.
Distinct
FPM controls in
BR and
MM in the partial dependence analysis were observed (
Figure 6 and
Figure 7, respectively). In
BR,
FPM demonstrates a strong inverse correlation with
rop, indicating fractured rock masses enhance penetration, whereas in
MM, this relationship is weaker, suggesting fracturing plays a lesser role.
fob and
tor are stronger predictors in
BR, implying fractures are more mechanically induced, while in
MM, structural factors likely dominate.
bap increases
FPM in
BR but stabilizes in
MM, indicating flushing efficiency has less influence on fracture frequency in
MM. These findings highlight formation-specific differences, with
BR fracturing driven by drilling resistance, whereas in
MM, geological discontinuities are more influential.
Figure 8 and
Figure 9 demonstrate the partial dependence analysis on
GSI in
BR and
MM, respectively. In
BR,
rop is strongly inversely related to
GSI, suggesting higher-quality rock resists penetration, whereas in
MM, this relationship is weaker, indicating structural factors may play a greater role.
fob and
tor exhibit stronger positive correlations with
GSI in
BR, implying mechanical resistance is a key indicator of rock mass quality, while in
MM, these effects are less pronounced, suggesting
GSI is less constrained by drilling force.
bap influences
GSI in both formations, but in
BR, higher
bap is more predictive, likely due to differences in air flushing dynamics in competent rock. These results suggest
MWD-based
GSI predictions require formation-specific adjustments, with
BR dominated by mechanical controls and
MM potentially influenced by additional geological factors.
The Measured MWD variables influence UCS, FPM, and GSI differently in BR and MM. In BR, UCS and FPM are strongly controlled by mechanical resistance, with fob and tor as key predictors, while in MM, weaker correlations suggest fracturing and mineralogy play a greater role. rop inversely correlates with UCS and GSI in BR, indicating higher-quality rock resists penetration, whereas in MM, this relationship is weaker, implying structural controls dominate. bap affects UCS and FPM more in BR, while in MM, it stabilizes earlier, suggesting flushing efficiency is less critical. These results highlight formation-specific controls, with BR driven by mechanical resistance and MM influenced by broader geological factors.
3.1. Feature Importance
To determine the statistical significance of MWD predictors, a Boruta-SHAP approach was applied. This involved augmenting the predictor set with shadow variables (randomly permuted versions of each feature) and then applying SHAP analysis to the expanded feature set. This section compares the significance of the numerous features employed for the prediction of rock mass conditions to determine the importance of each MWD parameter to describe geotechnical properties. The results of the Boruta-SHAP analysis reveal distinct trends in the influence of MWD variables on UCS, FPM and GSI for the BR and MM Formations, including:
Ratio-based features (baprop, fobrop, torrop) ranked higher than raw features in UCS and GSI prediction.
MSD features (ropS, torS) were critical for FPM, indicating their effectiveness in detecting fracture-related variability.
bap emerged as the most significant raw feature, reinforcing the role of flushing pressure in geotechnical characterization.
These results challenge the conventional assumption that rop and tor are the primary indicators of geotechnical conditions, highlighting the need for multi-variable analysis.
3.1.1. Feature Importance Boruta-SHAP – UCS
The
Boruta-SHAP values for
UCS prediction in
BR (
Figure 10) indicate that
bap,
bapfob,
torbap, and
torS are the most influential features, suggesting that
UCS in
BR is primarily controlled by pressure-related variables rather than purely force-based parameters. This challenges the initial assumption that
fob and
tor would dominate, instead highlighting that higher
bap correlates with increased
UCS, implying a pressure-dominated response where greater air pressure is required to penetrate stronger rock. While
rop and
fob still contribute, their role appears secondary to
bap-related variables, suggesting that mechanical resistance remains a factor but is overshadowed by pressure effects.
Boruta-SHAP values for BR range from -20 to +50, reflecting high UCS variability across different sections of the pit, which aligns with the geological complexity of BR, characterized by interbedded high-strength chert and quartz layers. This high variability emphasizes the importance of drilling efficiency and airflow management in UCS interpretation from MWD data. The pressure-driven nature of UCS estimation implies that rock breakage and drill performance are more affected by air pressure regulation than direct force application, reinforcing the critical role of pressure-based drilling adjustments in BR formations.
The
Boruta-SHAP analysis for
MM UCS (
Figure 11) reveals that
fobtor,
bapfob,
fobS, and
rop are the most important variables, indicating a stronger influence of force-based interactions in
MM compared to
BR. While
bap remains a key factor,
MM shows greater importance of
fobtor and
fobS, suggesting that
UCS in
MM is controlled by both applied forces and pressure effects. Additionally,
rop and
ropS hold greater significance in
MM than in
BR, indicating a higher sensitivity of penetration rate variations to rock strength. This suggests that
UCS in
MM is not purely pressure-driven but also strongly influenced by drilling force and mechanical loading, which reflects the more homogeneous nature of
MM lithologies.
The Boruta shadow variables (_sv) confirm these trends, reinforcing the reliability of the primary predictors. In BR, ropfob_sv, torfob_sv, and bapfob_sv exhibit low SHAP values, confirming that bap and its interactions are genuine UCS predictors. The lower significance of force-related shadow variables further supports the dominance of pressure-based controls in BR UCS estimation. bap_sv, ropfob_sv, and fobtor_sv also exhibit low influence for MM, reinforcing that while pressure effects remain important, force-based variables play a more significant role in MM UCS predictions. The weaker impact of bap_sv in MM compared to BR suggests that UCS in MM is less affected by pressure-driven variability, instead relying more on mechanical resistance and penetration efficiency.
3.1.2. Feature Importance Boruta-SHAP – FPM
The
Boruta-SHAP analysis of
FPM highlights distinct fracture detection mechanisms between
BR and
MM, reflecting their fundamentally different geomechanical behaviors.
Figure 12 displays the most influential features in
BR are
baptor,
ropS,
torS, and
torbap, indicating that fracture detection is primarily controlled by rotational force and penetration rate fluctuations. The dominance of torque-based variables (
torS,
torbap) suggests that fractures in
BR are influenced by rotational resistance, likely due to the deposit’s high rock competency and existing fracture networks. In
MM (
Figure 13),
ropfob,
rop,
torfob, and
torbap emerge as the most significant predictors, suggesting that fracturing is driven more by penetration efficiency and applied force rather than rotational resistance, highlighting a greater dependence on mechanical loading rather than pre-existing structural controls.
The role of interaction-based variables further underscores these contrasting fracture behaviors. In BR, baptor and torbap exhibited strong feature importance, emphasizing the role of pressure-assisted torque in controlling fracture initiation. This suggests that fracturing in BR is more dependent on dynamic drilling interactions, in which changes in penetration rate and torque signal structural weaknesses. In contrast, MM is more influenced by ropfob and torfob, indicating that fracture formation is primarily governed by the interplay of penetration rate and force-based responses rather than pressure alone. This implies that fracturing in BR is more dynamic and controlled by drilling efficiency, while MM is more sensitive to mechanical load variations and stress-induced fracturing.
The distribution of Boruta-SHAP values across both deposits further reinforces these differences in fracturing mechanisms. ropS and torS exhibited a wide range of SHAP values for BR, suggesting that fluctuations in penetration rate and torque can either enhance or suppress fracture formation depending on localized geological conditions. The presence of extreme SHAP values supports the idea that brittle failure mechanisms dominate BR, where abrupt energy release leads to rapid fracture propagation. On the other hand, MM exhibits a more uniform and progressive fracture development process, as seen in the tighter distribution of SHAP values for ropfob and fob. The lower variance in SHAP values in MM suggests that fractures are governed by stress-driven mechanisms, leading to gradual failure rather than abrupt mechanical breakdown.
The Boruta shadow variables (_sv) confirm the robustness of the key predictors. In BR, shadow variables such as ropfob_sv, fobS_sv, and torfob_sv exhibit minimal impact, reinforcing that the real counterparts (ropfob, fobS, torfob) are meaningful indicators of fracture formation. The low influence of force-based and penetration rate shadow variables suggests that fracturing in BR is governed by true drilling responses rather than random fluctuations.
Similarly, in MM, bap_sv, torrop_sv, and fobtor_sv show negligible influence, confirming that the primary features—ropfob, rop, torfob—are legitimate predictors of FPM in MM. The lower impact of shadow variables for MM compared to BR aligns with its more stable geomechanical behavior, suggesting that fracture development in MM is more predictable and less influenced by drilling variability. The limited significance of air pressure-related shadow variables in both deposits further support fracturing is primarily driven by penetration rate and force-based interactions rather than pressure alone.
3.1.3. Feature Importance Boruta-SHAP – GSI
The Boruta-SHAP results for GSI in BR and MM illustrate distinct patterns in how MWD variables correlate with rock mass quality. In BR, Boruta-SHAP values are more widely distributed, ranging from approximately -8 to +10, reflecting high variability in rock strength due to localized geological heterogeneity, such as alternating iron-rich and siliceous bands. In contrast, MM exhibits a narrower SHAP value range (-15 to +10), with values focused around zero, indicating more uniform rock strength and less short-range variability. The reduced spread of SHAP values in MM suggests that GSI can be more reliably estimated using steady-state MWD variables, as opposed to the more dynamic drilling responses seen in BR.
Key feature importance rankings further highlight deposit-specific differences in
GSI prediction.
torrop, bap, and
fobS emerge as the most influential variables in
BR, with
torrop exhibiting the highest importance (
Figure 14). This suggests that rotational torque and its interaction with penetration rate strongly influence
GSI in
BR, likely due to banding, alteration, and fracturing effects. Other key features include
fobrop, ropbap, and
bapfob, highlighting the role of multiple variable interactions in determining
BR geotechnical properties. The strong influence of force-based variables (
fob, tor) suggests that rock mass quality in
BR is more sensitive to mechanical resistance and drilling force fluctuations.
On the other hand,
MM (
Figure 15) shows a different feature importance hierarchy, with
rop, bapfob, and
bap as the dominant predictors. Unlike
BR,
bap has a stronger direct relationship with GSI, suggesting that air pressure-based variables are more predictive of rock strength in
MM formations. Additionally, torque-related variables (
torrop, torbap) have greater importance in
MM than in
BR, reinforcing the role of rotational drilling forces in characterizing the
MM rock mass. The lower importance of force-based variables in
MM, along with the reduced variance of
Boruta-
SHAP values, suggests that
MM exhibits a more homogeneous geomechanical structure, resulting in more stable and predictable drilling responses compared to
BR.
The Boruta shadow variables (_sv) further validate the robustness of the GSI predictions by distinguishing genuine predictive features from statistical noise. For BR, torbap_sv, fobS_sv, and torrop_sv exhibited minimal importance, confirming that the real counterparts of these variables—such as torrop, fobS, and bap—are genuine indicators of rock mass quality. The presence of multiple interaction-based shadow variables with low significance reinforces the idea that GSI variations in BR are primarily controlled by actual drilling responses rather than random fluctuations.
In MM, bap_sv, rop_sv, and torbap_sv showed similarly low importance, supporting the conclusion that bap and rop remain the dominant drivers of GSI in MM formations. The weaker influence of shadow variables in MM compared to BR aligns with the observation that GSI prediction in MM is more stable and less influenced by short-range variability. This distinction highlights how MWD-based GSI estimation is more complex in BR due to geological heterogeneity, whereas MM allows for more reliable and consistent predictions using less variable drilling parameters.
3.2. Regression-based ML Overview
The predictive performance of five regression-based ML models—DT, SVM, RF, GP, and NN—was evaluated for estimating UCS, FPM, and GSI from MWD data. The results revealed consistent trends across all models, with NN and RF outperforming other approaches in both datasets, particularly in BR where larger data volume and more homogeneous geotechnical conditions contributed to higher predictive accuracy.
Across all geotechnical parameters, NN consistently achieved the highest R² and lowest RMSE, followed closely by RF. The superior performance of NN can be attributed to its ability to model complex, nonlinear interactions between MWD variables and geotechnical properties. RF also demonstrated strong predictive capability, benefiting from its ensemble approach that reduces overfitting by averaging multiple DTs.
Conversely, DT, SVM, and GP exhibited lower predictive accuracy, particularly in MM, which is characterized by higher geological variability. DT was the least effective model, likely due to its sensitivity to noise in the MWD dataset and its tendency to overfit training data while failing to generalize well to test data.
The superior performance of NN and RF can be attributed to their ability to handle high-dimensional, multivariate datasets where complex interactions exist between input features. In contrast, simpler models like DT and SVM struggled because:
DT is prone to overfitting and lacks the ability to capture intricate geomechanical relationships.
SVM relies on a fixed decision boundary, which is not well-suited to continuous, nonlinear geotechnical responses.
GP, while effective in some cases, is computationally expensive and does not generalize well with the large, noisy datasets typical of MWD applications.
The comparison between NN and RF suggests a trade-off where NN provides the highest predictive accuracy but requires significantly longer training times and RF offers a strong balance between accuracy and computational efficiency, making it a practical choice for real-time mining applications. GP may be useful for smaller datasets but is not well-suited for large-scale MWD data due to its high computational cost.
Given these findings, future work should explore hybrid modeling approaches that combine the interpretability of RF with the high-dimensional learning capabilities of NN. Additionally, model uncertainty quantification should be incorporated to better assess prediction reliability in heterogeneous formations such as MM.
3.2.1. UCS Prediction
For the
BR dataset, the
NN model outperformed all other approaches, achieving an R² of 0.96 and the lowest RMSE of 7.7 MPa for both training and test datasets (
Table 3). The
RF model also exhibited strong predictive capabilities, with R² = 0.94 and RMSE = 9.3 MPa on the test set.
SVM and
GP demonstrated similar predictive accuracy, with R² values above 0.91 and RMSE values in the range of 10.4 to 11.9 MPa.
DT, while computationally efficient, had the lowest predictive accuracy in
BR, with R² = 0.90 and RMSE = 12.4 MPa.
In contrast, model performance was notably lower for the MM dataset, with a wider range of R² values (0.46 to 0.84) and higher RMSE values across all models. NN remained the best-performing model for MM, with R² = 0.84 and RMSE = 11.95 MPa. RF followed closely, achieving an R² of 0.78 and RMSE of 13.95 MPa. SVM and GP displayed moderate predictive ability, with R² values ranging from 0.62 to 0.68. DT performed the poorest, with an R² of 0.46 and the highest RMSE of 21.88 MPa, indicating important prediction errors for MM rock strength values. The higher RMSE values in MM suggest that UCS estimation in this formation is more challenging due to greater lithological variability. The lower performance of GP and SVM (R² = 0.62–0.68 for MM) suggests that these models struggle with the nonlinearity inherent in MWD-UCS relationships.
The training times for each model varied, with NN requiring the longest computation time (517 s for BR and 184 s for MM). Conversely, GP were the fastest models, taking only 4 s for BR and 2.4 s for MM. DT and SVM also demonstrated relatively fast training times, with SVM completing training in under 10 s for both deposits. RF had moderate computational demand, requiring 60 s for BR and 66.1 s for MM.
These results indicate that NN and RF offer the best compromise between predictive accuracy and robustness, particularly for the BR dataset. The MM dataset exhibits greater prediction uncertainty, likely due to increased geological heterogeneity or reduced data volume. Future work will focus on exploring the underlying causes of performance degradation in MM predictions, refining feature selection techniques, and integrating additional geotechnical parameters to improve model generalization.
3.2.2. FPM Prediction
The
BR results presented in
Table 4 demonstrate the
NN model outperformed all other approaches, achieving an R² of 0.98 and the lowest RMSE of 1.0 for both training and test datasets, likely due to its ability to captu
re highly nonlinear fracture formation mechanisms. The
RF model also exhibited strong predictive capabilities, with R² = 0.96 and RMSE = 1.3 on the test set.
SVM and
GP demonstrated similar predictive accuracy, with R² values above 0.93 and RMSE values ranging from 1.6 to 1.8.
DT, while computationally efficient, had the lowest predictive accuracy in the
BR dataset, with R² = 0.94 and RMSE = 1.7.
Model performance was lower across all methods in MM models, with R² values ranging from 0.53 to 0.93. The NN model remained the best-performing approach, achieving R² = 0.93 and RMSE = 2.2, followed by RF (R² = 0.87, RMSE = 2.9). SVM and GP displayed moderate predictive ability, with R² values between 0.76 and 0.84, while DT exhibited the weakest performance, with R² = 0.53 and RMSE = 5.5, due to its tendency to oversimplify fracture-related interactions, leading to poor generalization.
The training times for each model varied, with
NN requiring the longest computation time (
Table 3). Conversely,
GP were the fastest models, taking only 4 s for
BR and
MM.
DT and
SVM also demonstrated relatively fast training times, with
SVM completing training in under 10 s for both datasets.
RF had a moderate computational demand, requiring 77 s for
BR and 59 s for
MM.
Overall, these results indicate that NN and RF provide the most robust predictive performance for both BR and MM datasets, with NN yielding the highest accuracy but at a higher computational cost. The MM dataset exhibits greater prediction uncertainty, likely due to increased geological heterogeneity or reduced data volume. Future investigations will focus on identifying the sources of variability in MM predictions, optimizing feature selection, and integrating additional geotechnical parameters to enhance model generalization.
3.2.3. GSI Prediction
For the
BR dataset, the
NN model achieved the highest accuracy, with an R² of 0.99 and the lowest RMSE of 2.0 for both training and test datasets (
Table 5). The
RF model also performed well, with R² = 0.98 and RMSE = 2.6, followed by
SVM (R² = 0.97, RMSE = 3.2) and
GP (R² = 0.97, RMSE = 2.9).
DT had the lowest accuracy among the models tested, with R² = 0.94 and RMSE = 4.3, but remained computationally efficient.
In contrast, MM model performance was comparatively lower across all methods, with R² values ranging from 0.68 to 0.93. NN remained the best-performing model, achieving R² = 0.93 and RMSE = 2.4, followed by RF (R² = 0.87, RMSE = 3.2). SVM and GP displayed moderate predictive ability, with R² values between 0.78 and 0.86, while DT exhibited the weakest performance (R² = 0.68, RMSE = 5.1). The performance gap in MM suggests that GSI estimation is more sensitive to variations in MWD-derived parameters, particularly bap and torrop.
The training times varied among models, with NN requiring the longest computation time (255 s for BR and 297 s for MM). In contrast, DT were the fastest models, requiring only 2 s for both datasets. RF had moderate computational demands (80 s for BR, 60 s for MM), while GP and SVM were relatively fast (5–8 s across both datasets). The high R² values of GP and NN (above 0.97 for BR and 0.91 for MM) suggest potential overfitting, particularly for BR, where geological variability may be lower.
Overall, NN and RF offer the best balance between predictive accuracy and robustness, particularly in GSI prediction for BR deposits. The MM dataset exhibits greater prediction uncertainty, which may necessitate further investigation into geological variability and dataset quality. Future work will focus on examining model generalization, refining feature selection techniques, and optimizing hyperparameters to mitigate overfitting while maintaining prediction accuracy.