1. Introduction
Computer vision has advanced greatly in recent years with deep learning approaches pushing the boundaries of image analysis and object recognition [
1]. These approaches have created opportunities for the segmentation of complex and irregular objects such as rock particles in geological images. Accurate identification and segmentation of rock particles is vital and holds high significance in geology [
2], and other industries such as engineering construction [
3,
4,
5], mining [
6,
7], and the petroleum industry [
8], just to mention a few. In geology for example, geological analysis has shown to be essential to understanding the Earth’s history, processes, and composition, thus proving vital in lithology, mineral prospecting, environmental studies, and hazard assessment [
9]. Before carrying out this analysis, Geologists carefully study rock formations where they can identify geological markers, from microscopic grains to large slabs of rock, that can provide essential information to understanding ancient ecosystems, geological transitions, and current geological dynamics [
10]. By examining the shapes, sizes, and textures of these rock particles, Geologists can utilize the information to reveal key characteristics of the Earth’s geology, minerals, sedimentary processes, and climate, thereby providing a stronger foundation for explaining the Earth’s past and predicting its future [
11]. However, carrying out any form of statistical analysis to extract useful information from these rock particles is only possible after the accurate segmentation of the particles.
Traditional methods of rock segmentation have required manually identifying and classifying the rocks either physically or from photographs. This human-subjective particle recognition process is laborious, time-consuming, and prone to standardization errors that may emerge from the growing data volumes, thus rendering the process inefficient and unreliable. This brought forth the incorporation of image segmentation techniques like thresholding, watershed, region and edge-based segmentation, and clustering [
12,
13], to address the limitations of the traditional method. However, these segmentation techniques have been found to be ineffective when working on a batch of images, most specifically when these images have occluded objects or irregular shapes of different sizes. This is so, because, they expect the objects in the images to exhibit specific characteristic features based on their background, lighting, size, or shape. They also require manual tuning of their parameters and fail to produce accurate segmentations when the objects don’t have clear boundaries, as they are more useful with high-contrast images.
Matthew [
14] implemented a multi-Otsu threshold algorithm [
15] to segment a synthetic rock image dataset generated from sandstone and shale rock. For the greyscale rock objects with high noise levels, the algorithm was ineffective, while also failing completely at segmentations for non-greyscale (textural contrast) images. Qinpeng et al [
5] proposed an adaptive watershed algorithm for segmenting blasted rock particles based on the rock contour solidity threshold. The algorithm was able to reduce errors from blurry edges and overlapping particles. Still, it wasn’t robust enough as it required different solidity thresholds for different rock types which had to be manually set up. Haibo and Jialing [
16] combined characteristics of the Fuzzy C-means clustering (FCM) algorithm with a semi-supervised Support Vector Machine (SVM) for segmenting rock images by taking into account the spatial information of the image pixels. Amankwah and Aldrich [
17] came up with a novel approach to the watershed algorithm for segmenting rocks, solving the watershed over-segmentation problem by leveraging the adaptive threshold technique to generate object markers. This ensured the watershed segmentation lines correctly matched the object boundaries. Now, although exponential development has occurred with the introduction of these techniques in comparison with prior manual human segmentation methods, they are not robust enough for rock particles of varying features - sizes, and shapes.
Consequently, classical machine learning algorithms like the supervised learning algorithms: K-nearest neighbors (KNN), ensemble learning - Adaboost, Random Forest, SVM, and the unsupervised learning: K-Means Clustering, have shown prospects in recognizing and processing rock particles from digital images [
18,
19,
20,
21]. These algorithms require a diverse dataset containing tons of images with particles along with their corresponding labels so the model built from the algorithm can learn the intricate features of the data. Junxing and Roman [
22] utilized the Adaboost ensemble algorithm with varying aspect ratios and sliding window orientation angles for recognizing sand particles. However, although the authors used a large training dataset of 85,000 images, the proposed method still failed to detect particles with irregular surfaces and elongated sizes. Therefore, there is a need for a more robust and adaptable system that can learn hierarchical representations captured in complex particle features with varied object boundaries and lightning conditions, thus leading to the implementation of Deep Learning systems [
4,
23,
24,
25].
In recent years, Deep Learning computer vision systems have paved the way in capturing global features of objects thus pushing the boundaries of image analysis and object recognition [
1], and creating opportunities for the segmentation of complex and irregular objects such as rock particles in geological images. These systems offer robust capabilities in recognizing complex object features by passing in the training data with pixel-wise labels, so the model can learn the unique mappings of the rock objects [
26]. Karolina and Sebastian [
20] proposed a multi-textural approach in segmenting rock grains using the SVM, KNN, and a fully connected Artificial Neural Network (ANN) with the Levenberg-Marquardt back propagation method. Although the ANN classifier performed the best at a 79% accuracy score, the authors identified the need for post-processing techniques to improve the border thinning and automatic closing of delineated contours of the rock particles. With remarkable achievements being made from the emergence of Convolutional Neural Networks (CNN) [
27], the efficiency and accuracy of segmentation techniques enhanced through Semantic segmentation architectures like the Fully Convolutional Network (FCN) and U-Net [
7,
24,
28,
29,
30,
31,
32], providing a means of classifying every image pixel into a class label. Huang et al [
28] utilized an FCN for the segmentation of cracks and leakage defects in tunnels, resulting in an improved performance in comparison to traditional techniques like Adaptive thresholding and region growing algorithm. In the mining industry, Yang et al [
7] leveraged depth-separable convolutions with feature-depth concatenation to improve the rock particle segmentation accuracy, using an enhanced U-Net model architecture. Zhou et al [
24] proposed a dual U-Net with multi-scale inputs and side-output for the automatic segmentation of muck images. Even though the U-Net algorithm was improved upon to produce more accurate results, it encountered limitations when segmenting overlapping rock objects which led to under-segmentation and over-segmentation. Additionally, the proposed algorithm did not bear in mind the regional features of the rocks such as the edges, and the rock surface. Liang et al [
29] developed a deep convolutional neural network, by leveraging the U-Net architecture to extract particle projects from raw images. However, the authors highlighted the lack of generalization with the models’ predictions when attempting to segment rock types that were different from what was used in the training dataset.
Due to its design, semantic segmentation encounters limitations when the objects are densely packed with overlapping particles, leading to under-segmentation or over-segmentation [
24]. Beyond semantic segmentation where segmentation identifies and labels every image pixel in the same class under the same category, instance segmentation has been shown to accurately distinguish overlapping instances by delineating object boundaries, making it easier to carry out further independent actions on the individual rock particles. For example, size calculation, shape identification, or any form of counting/tracking application would be more straightforward. These instance segmentation algorithms - Mask RCNN [
33], in the Detectron2 [
34] framework, are better suited to identify each individual object instance, making it more adaptable and efficient for rock recognition. Fan et al [
4] utilized the Mask R-CNN model to segment rockfill particles by leveraging the ResNeXt101 backbone with the squeeze and excitation block, enhancing the feature extraction capability. Trong et al [
35] developed an automatic means of measuring blast fragments in open-pit mines using the Mask R-CNN algorithm, although the authors observed a decline in segmentation accuracy due to poor spatial resolution of the image. As the application of instance segmentation methods to rock recognition is fairly novel, this research study provides a stepping stone for further research.
In this study, Mask R-CNN and Detectron2 models are implemented for recognizing rock particles by leveraging the transfer learning technique, which saves time and delivers excellent results with limited training data [
36]. To start off, the quality of the training data is improved through a Gaussian filter and an Illumination Adaptive Transformer (IAT) [
37], to denoise the image and to adjust the lighting exposure respectively. Then, the models are trained using the annotated preprocessed data with different hyperparameters, after which the performance of the proposed algorithms is evaluated using the standard Average Precision (AP) evaluation metric on a test dataset with diverse environments, by comparing the identification ability with similar research works. The rest of the paper is organized as follows:
Section 2 covers the proposed methodology, introducing the dataset with the segmentation algorithms.
Section 3 describes the training process implemented on the preprocessed data as well as the steps involved in evaluating the developed models. In
section 4, the results from the training experiments are discussed alongside identified drawbacks. Finally,
Section 5 concludes the paper with recommendations for future work.