1. Introduction
The detection of road potholes is a critical issue in transportation safety, as these defects can significantly compromise vehicle integrity and driver safety. Potholes, formed through the combined effects of traffic stress and environmental factors, contribute significantly to road infrastructure degradation, resulting in increased maintenance costs, vehicle damage, and accidents. Studies indicate that potholes accounted for approximately 0.8% of road accidents in 2021, contributing to 1.4% of fatalities and 0.6% of injuries annually [
1]. Additionally, the deterioration of road surfaces due to heavy traffic and adverse weather conditions can lead to potholes as deep as 10 inches [
2]. This affects vehicle performance and increases operational costs for drivers, with potholes estimated to add approximately
$3 billion annually in costs in Canada alone [
3].
Accurate detection and risk assessment of potholes are crucial to mitigating their impact on safety and optimizing road maintenance. Other machine learning techniques, such as the YOLO (You Only Look Once) object detection algorithm, can detect potholes in real-time, which means that officials can take the necessary actions immediately [
4,
5,
6]. YOLO models, being a part of the deep learning family, have proven remarkable in pothole detection across different settings; therefore, play a critical role in road safety management [
7,
8]. Furthermore, by ensuring more techniques like depth estimation, more can be understood about the severity of potholes and their effects on vehicles which would further aid in maintenance decisions [
9].
This study focuses on leveraging YOLOv9, for accurate instance segmentation and Mask R-CNN, and combines it with a Multi-Criteria Decision-Making (MCDM) framework to address the limitations of previous models. While earlier YOLO-based approaches, such as YOLOv8, demonstrated effectiveness in marking and detecting potholes, they lacked the capability to identify potholes that are not deep but still contribute to road imbalance [
10]. This limitation is significant, as shallow yet widespread potholes can also pose risks to vehicle stability and safety. The YOLOv8 model achieved training and validation losses of 0.06 and 0.04, respectively, but its reliance on bounding boxes restricted its ability to capture geometric details and assess the impact of individual potholes accurately. Similarly, the study by Gorro et al. employed YOLOv8 for pothole detection using bounding boxes [
11]. While the results were promising, the approach struggled to detect potholes that are not deep but have larger dimensions, which can still cause significant road imbalance. This limitation led to increased false positives [
11].
Building on this foundation, the current study utilizes YOLOv9’s instance segmentation capabilities to generate detailed masks of potholes, capturing their exact shapes and dimensions. These masks are analyzed to calculate key geometric properties, such as area, perimeter, and estimated depth, which are critical indicators of the severity of the potholes. The integration of an MCDM framework allows for evaluating each pothole based on multiple weighted criteria—such as size, depth, location, and shape irregularities—to rank them according to their potential hazard levels. This prioritization ensures optimal resource allocation for repairs, improving road safety and reducing costs.
Ensemble learning ensures that both models collaborate to detect potholes robustly, using YOLOv9 for rapid instance segmentation and Mask R-CNN for precise boundary refinement.
This study focuses on the research question:
1.) Can ensemble learning (YOLOv9 instance segmentation and Mask R-CNN) and an MCDM framework reliably detect potholes?
4. Results and Discussion
Training Result Analysis
The training and validation results for the YOLOv9e instance segmentation model show effective learning and stable performance. The smoothed curves for training losses (box, segmentation, classification, and distribution focal loss) are steadily decreasing, suggesting consistent advances in object localization, segmentation, and classification. Validation losses similarly follow a consistent pattern, albeit a modest rising trend in segmentation loss towards the latter epochs signals potential overfitting, which can be addressed by extra regularization or early stopping. Precision, recall, and mean Average Precision (mAP) measures for bounding boxes and masks develop steadily and plateau at high levels, demonstrating the model’s good detection and segmentation abilities. The results show a well-optimized model with good precision and recall values, indicating reliability in real-world applications. However, more modification may improve segmentation performance by addressing potential overfitting in the validation loss.
The confusion matrix gives a detailed evaluation of the YOLOv9e model’s ability to detect potholes. The program properly classified 1,932 true potholes as such, demonstrating its capacity to accurately detect actual cases. However, it mistakenly classified 1,548 genuine potholes as background, indicating a high percentage of false negatives. This suggests that some potholes were missed during detection. On the other hand, the model misclassified all actual background events, either failing to predict them or mistaking them for potholes, yielding no right background predictions. Furthermore, 1,051 background instances were mistakenly classified as potholes, resulting in false positives. These findings show that, while the model is capable of identifying potholes, there is a significant imbalance in its capacity to appropriately differentiate between potholes and background. This highlights the need for additional model optimization, notably in minimizing false negatives and false positives, in order to improve its practical application in real-world circumstances.
Evaluation of Model Performance YOLOV9 only
Figure 3.
Confusion Matrix Result
Figure 3.
Confusion Matrix Result
The
Figure 4 depicts the Precision-Confidence Curve, which shows the link between precision and confidence level for spotting potholes. As the confidence threshold rises, the model’s precision gradually improves, showing fewer false-positive detections. At a confidence level of 0.908, the model achieves an accuracy value of 1.00 for all classes, proving its ability to predict only true positives at higher thresholds. This trend demonstrates the model’s capacity to make extremely reliable detections when a stricter confidence restriction is set. The graph also illustrates that the precision begins relatively low at lower thresholds but steadily increases, implying that the model initially includes a higher number of inaccurate predictions that are filtered out as the threshold grows more severe. This approach is critical in identifying the best confidence level for balancing precision and recall in practical applications.
Mask Precision Curve
The Recall-Confidence Curve depicted in the
Figure 5 assesses the model’s ability to detect potholes at various confidence levels. The curve shows how recall varies as the confidence level is increased. At low confidence levels, recall values are greater (about 0.81 for all classes at a confidence level of 0.0), demonstrating that the model is effective at detecting the majority of potholes. However, as the confidence threshold grows, recall declines, implying that the model becomes tougher in its detections, perhaps missing some potholes. This behavior demonstrates the trade-off between recall and confidence, with lower thresholds favoring higher recall and higher thresholds emphasizing precision. The trend also demonstrates the model’s general sensitivity, as it retains a moderate recall even at mid-level confidence levels, making it ideal for applications that require wide detection coverage.
Mask Recall Curve
The Precision-Recall (PR) curve is a comprehensive investigation of the YOLOv9e model’s pothole detecting capabilities. The graph shows a smooth trade-off between precision and recall, with an overall mean Average Precision (mAP) of 0.556 at an IoU threshold of 0.5. This implies that the model has a balanced detection capability, which efficiently reduces false positives while maintaining a fair recall rate. The slow decline of the PR curve indicates that the model works consistently across different confidence thresholds, making it dependable for spotting potholes in real applications. However, further modification may improve precision at greater recall values, thereby increasing total robustness.
Mask Precision-Recall Curve
Figure 6.
Precision-Confidence Curve
Figure 6.
Precision-Confidence Curve
The F1-score for all classes, calculated with a confidence level of 0.282, is 0.58. This demonstrates the YOLOv9e model’s balanced performance, with a slight trade-off between precision and recall. The F1-score represents the model’s ability to detect potholes effectively while producing an acceptable number of false positives and false negatives. This score indicates that the model performs well, but there is potential for future improvement to increase detection accuracy and reliability in practical circumstances.
Mask F1-Score Curve
Figure 7.
F1-Confidence Curve
Figure 7.
F1-Confidence Curve
Figure 8 illustrates the masking validation of the test set. The results show that some potholes have a lower confidence score of 0.5. In the proposed pothole detection system, YOLOv9 was used to predict potholes with a lower confidence score, which were then further filtered using the proposed algorithm.
Masking and Detection Analysis
Figure 9 illustrates the masking validation results after integrating the MCDM algorithm, which allows detection of objects with low confidence scores. The accuracy of detection increases significantly as the YOLOv9 model, in some cases, fails to detect certain potholes and assigns them low confidence scores. To address this issue, the prediction parameter was adjusted to allow predictions with confidence scores as low as 0.3. The proposed algorithm was then applied to minimize false positives, as low confidence scores can also lead to incorrect detections.
Figure 10 shows the new confusion matrix when using the ensemble learning and MCDM criteria. The result shows an estimated 20% increase in accuracy due to the increase in true positive detection of potholes.
New Confusion matrix after applying ensemble learning and metaheuristics criteria
The new F1-Confidence curve demonstrates a well-balanced trade-off between precision and recall. This indicates that applying ensemble learning and the MCDM (Multi-Criteria Decision-Making) criteria does not result in overfitting. Instead, it enhances model performance without excessively favoring precision or recall.
Improved F1-curve
Figure 11.
Improved F1-curve
Figure 11.
Improved F1-curve
The model is producing less false positive predictions at every threshold when the precision is higher across confidence levels. The model gains by merging several decision boundaries through the use of ensemble learning approaches, which lowers prediction uncertainty. Decisions are informed and optimized across a variety of criteria (e.g., confidence, true positive rates, or context-specific parameters) thanks to the integration of MCDM. The smooth and consistently higher precision observed across all thresholds suggests that the model retains its robustness and generalizability.
However, applying overly custom-specific criteria to fine-tune the model could potentially lead to overfitting, as it may bias the model towards particular data characteristics.
To explore the weaknesses of our proposed algorithm, the weights of the defined criteria were adjusted, and the model was tested on newly seen data using the ensembled model. As shown in
Figure 12, the results indicate overfitting, as the model’s performance becomes overly specific to certain patterns in the training data. This is evident from the confusion matrix, where the detection of ’pothole’ dominates, leading to poor generalization for the ’background’ class. Similarly, the F1-confidence curve highlights this issue with a steep, narrow peak, indicating that the model performs well only within a specific confidence range while failing outside of it. This overfitting behavior emphasizes the limitations of defining too many criteria or applying excessive weighting adjustments, which hinder the model’s generalization to unseen data.
5. Conclusions
This study successfully created an enhanced pothole identification system by combining ensemble learning approaches (YOLOv9 and Mask R-CNN) with a Multi-Criteria Decision Making (MCDM) framework to improve pothole detection accuracy and priority. By combining the characteristics of YOLOv9 for quick detection and Mask R-CNN for precise segmentation, the system successfully combines detection outputs to improve accuracy. The new use of low-confidence thresholding when prioritizing key potholes has shown to be a considerable improvement, allowing for the detection of high-severity flaws even under less strict criteria.
With extensive training on 5477 annotated pothole samples, the system achieved outstanding performance metrics, including a mean Average Precision (mAP) of 0.935 at 0.5 IoU and an F1-score of 0.94 at a confidence level of 0.576. Finally, the algorithm demonstrated a 20% increase in the accuracy of detecting critical potholes, ensuring a reliable identification of high-priority road defects. This study underscores the system’s potential to address real-world infrastructure management challenges by facilitating timely and informed decision-making.
Several recommendations are provided to improve the proposed pothole detecting system’s capabilities and real-world applicability. First, the system must be tested in a variety of real-world contexts to determine its robustness and adaptability to changing road conditions, illumination, and weather scenarios. Expanding the training dataset to include more samples from various geographies and road conditions might help enhance model generalization and performance. Furthermore, introducing adaptive weight modifications within the MCDM framework would enable the system to better prioritize region-specific demands, such as urban vs rural road maintenance requirements. Continuous optimization of the detection algorithm, including exploring advanced techniques such as transformer-based models or real-time processing enhancements, could further improve detection accuracy and speed.