1. Introduction
Acoustic data is an indispensable tool in marine science that is widely used to map and analyze sea bottom characteristics. This NDT technique is especially useful for soil investigations, which include the use of bathymetric mapping to study the topography of the ocean floor, conducting both shallow and deep acoustic surveys to gain insights into the subsoil, identifying existing infrastructure both on and beneath the seafloor, and determining underwater positioning [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12]. The marine industry relies heavily on calibrated backscatter intensity to classify the composition of the sea-bottom upper layer using remote sensing tools that have become standard in the field.
When looking at the scientific literature, it becomes apparent that research into the spectral analysis of the acoustic response of the seabed is relatively limited, with a focus on large-scale data and non-localized side scan data. Single beam type sonars, such as sub-bottom profilers or multibeam sonars, are rarely studied in comparison. While some studies have explored the use of signal and image processing techniques, occasionally in combination with AI algorithms, no reliable quantitative classification method has yet been established [
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28]. Other studies focused on calibrated backscatter intensity methods [
29,
30], and sometimes assisted with sub-bottom profiler for geological background [
31]. Examining the available literature indicates that the primary means of assessing sub-bottom soil composition in acoustic seabed studies is through backscatter intensity, as exemplified by [
32], which often requires extensive calibration such as that provided by the geocoder algorithm [
33]. A comprehensive review conducted by Anderson et al. [
34] emphasized the need for careful calibration when attempting to relate acoustic-backscattering measurements to the sub-surface properties and content of the seabed. The few studies [
27] that have attempted to identify sea-bottom soil types based on spectral characteristics have mostly been limited to side-scan sonar data, which is inherently limited by the scale of the features it captures, as noted by [
27]. Moreover, it should be noted that a single swath of acoustic data may comprise multiple soil types, limiting such methods’ applicability. While they may yield valuable insights when applied to large, relatively homogeneous sea bottom regions, they may be less effective when dealing with more complex, heterogeneous areas containing a variety of soil types, such as rock, sand, and clay.
The application of artificial intelligence (AI) techniques has significantly contributed to the advancement of acoustic data classification in marine science, providing a more objective and efficient approach to seabed mapping. A study conducted by [
35] utilized hybrid artificial neural networks that incorporated a self-organizing feature map and learning vector quantization to classify four different manganese nodule-bearing sites using angular backscatter intensity as the learning and classification features. The resulting model achieved classification accuracies ranging from 87 to 95%. Additionally, [
36,
37,
38,
39] utilized a feed-forward neural network and convolutional neural networks to predict source charge and bottom composition (mud and sand combinations) for sus charges. The features used varied from simulated pressure time series or extracted features such as peak level, an integrated level, signal length, and decay time, to simulated peak pressure and backscatter intensity. The accuracies of the models varied from 84% to 97% for the convolutional neural network used in [
37]. These studies illustrate the potential of AI algorithms in the classification of acoustic data for marine soil characterization.
The classification of soil based on multibeam echo sounder (MBES) data has been the focus of recent studies [
40,
41,
42]. In one such study, [
40] applied support vector machine classification to MBES bathymetric and backscatter data to classify mud, sand, and gravel, achieving an accuracy of 90%. Similarly, [
41] utilized deep neural networks to classify the same materials from bathymetric and backscatter data that were reduced using fuzzy ranking. This resulted in an accuracy of 86% for soil classification. Another study [
42] achieved a classification accuracy of 93% for mud, sand, and gravel combinations using deep learning with backscatter bathymetry and the angular response and mosaic texture of MBES data. These studies demonstrate the efficacy of AI techniques, specifically support vector machines and deep neural networks, in the classification of soil based on MBES data. By incorporating features such as bathymetric and backscatter data, as well as angular response and mosaic texture, the accuracy of soil classification has been greatly improved. These findings have important implications for the mapping and characterization of seabed environments, highlighting the potential of AI techniques for future research in marine science.
Upon analysis of the current state-of-the-art literature, it is evident that the features utilized in machine learning approaches for the topic of this article rely heavily on integral characteristics such as backscatter intensity as a function of the angle of incidence, mosaic texture, and bathymetry [
35,
36,
39,
40,
41,
42]. While these features have proven effective in seabed mapping and classification, it is important to note that actual time series or spectrum backscattered from a point on the seafloor have been rarely utilized, if at all. However, a study conducted by [
43] demonstrated the potential of utilizing the frequency domain representation of the time series of a reflected chirp sub-bottom profiler signal for distinguishing between sand and sandstone. This approach showed promising results, as the number of crossings of the spectrum at 1/16 of the maximal normalized power classifier demonstrated the ability to assess the probability for sand or sandstone with over 80% certainty in over 75% of the cases. Incorporating this type of approach into machine learning models for seabed mapping and classification may provide a more comprehensive understanding of the seafloor structures and their associated ecological communities, ultimately leading to better management and preservation of marine resources. Future research should explore the potential of utilizing time series or spectrum data in machine learning approaches for the characterization of marine environments, as it may lead to significant advancements in the field.
The main hypothesis of the research [
43] was that spectral features of acoustic signals reflected from the sand and sandstone sea bottoms are due to essential dissimilarity in the physical properties of these two media. These properties include fine-scale topography at the top of both types of sediments as well as the heterogeneity of several meters (depending on the signal length) below the top of the reflector. As these singularities are significantly different for sand and sandstone, they are expected to affect the acoustic signals reflected from the top of the sea bottom, which in turn affects their spectral parameters such as amplitude, main frequency, frequency-dependent reflection coefficient, number of spikes, etc. However, the main disadvantage of the presented method was the qualitative choice of the spectral parameters used for classification, resulting in less-than-optimal accuracy.
In the present study, the discrete values of the reflected time series and spectra obtained by [
43] were used as training sets for two logistic regression models [
44]. This machine-learning technique was found to be effective for the classification of sand and sandstone and was shown to compete with the results of much more complicated machine-learning techniques such as convolutional neural networks used in previous studies [
35,
36,
37,
38,
39,
40,
41,
42]. The logistic regression models showed promising results in terms of accuracy and were able to classify the sand and sandstone with high confidence, making it a viable and practical option for the classification of these seabed sediment types.
5. Discussion and Conclusions
Application of the logistic regression approach to a wide range of applied tasks is in the mainstream of geophysics in general and marine geophysics particularly [
45,
46,
47,
48,
49,
50,
51,
52], for example, for the study of the changes in soil properties [
45], the ocean processes [
46], etc. However, such an approach is rarely used for the sea bottom soil classification. From the results presented above it follows that the two sea bottom types can be successfully and quantitatively characterized by applying logistic regression models to either the backscatter time series of a frequency-modulated signal or the spectrum of that backscatter.
The achieved classification accuracy (over the verification sets) is 90% for the time series and 94% for the spectra. The improved results when using spectral data may be due to the option to clean noise with frequencies out of the bandwidth of the reflected signal. It is important to note that the models were trained using a relatively small dataset, which suggests that even higher accuracy may be achievable with larger datasets.
It is evident from the results of this study that applying machine learning algorithms has the potential to enhance sonar-based soil classification accuracies in comparison to manual extraction and classification criteria [
43].
The model achieved comparable accuracy to [
43] with a training set which is approximately 60% the size of training set required by [
43]. This might be important when collecting data in actual survey activities where the survey vessel is less stationary and hence less data is collected over each area and corresponding type of soil, making the method presented here more advantageous for soil classification based on data collected during standard hydrographic surveys.
As for feature selection for machine learning algorithms, it is assumed that the reflected spectrum (and corresponding time series) of a frequency-modulated signal is affected by the reflector characteristics such as grain size, relief, voids, stratification, etc. Hence, using this data as features is superior to using integral features such as peak pressure, total intensity, angular intensity, etc. As used by previous works performed on sonar data [
35,
40,
41,
42]. Applying a relatively simple logistic regression model on spectral data achieved superior accuracy to neural network methods applied to integral data (intensity, angular intensity, elevation, etc.) in the case of sonar data. Based on these findings it is assumed that the combination of spectral analysis with machine learning, and the use of the spectral series as features can enhance the performance of machine learning algorithms for sonar base soil classification. The application of spectral data as features for more advanced machine learning algorithms, and the advantages of its combination with other types of data such as the angle of incidence is of great potential for future research and the enhancement of remote marine soil classification.
Based on the results, the study presented here established soil classification method principles achieving high classification accuracy and easily applicable to data collected in standard hydrographic surveys.