1. Introduction
Deep underground spaces are characterized by low background noise, low cosmic ray intensity, and limited electromagnetic radiation. These features make underground laboratories crucial locations for research across microphysics, astrophysics, cosmology, and geoscience [
1]. Over the past decades, numerous international underground laboratories have been constructed and continuously developed, including SNO in Canada [
2], Kamioka in Japan [
3], Modane and LSBB in France [
4,
5], Boulby in the UK [
6], Baksan in Russia [
7,
8], and SarGrav in Italy [
9,
10], among others. These underground laboratories have volumes ranging from a few hundred to thousands of cubic meters, with vertical rock cover thicknesses ranging from a few hundred meters to over 2,000 meters. The thick cover layer offers an ideal low-background environment for experimental research across various disciplines [
11]. Some underground laboratories have conducted multi-geophysical field observations, such as LSBB in France, where superconducting magnetometers were employed to observe magnetic field signals, showcasing the inherent advantages of deep underground environments in capturing weak electromagnetic signals [
12]. Since 2020, collaborative observations of multi-geophysical fields have been conducted in the Huainan underground laboratory (HNLab) in China. These observations demonstrate that underground environments have advantage and potential for sustained, stable observations of geophysical fields [
13,
14,
15,
16,
17,
18,
19]. However, geophysical fields observations in HNLab are still in the early stages [
16], and numerous challenges remain, including experimental method design, underground lab infrastructure renovation, and instrument research and development. Furthermore, it is essential to fully utilize observational data from ‘ultra-silent’ and ‘ultra-clean’ environments to explore their potential in denoising traditional surface observation signals.
Magnetotelluric (MT) is a geophysical method used to observe natural time-varying electromagnetic fields for probing subsurface geo-electrical structures. MT has been widely employed to investigate crust and mantle structures, subsurface deformations, and deep processes. However, the weak natural EM field signals are susceptible to contamination by various anthropogenic noises, significantly limiting the practical effectiveness of MT. To obtain stable and high signal-to-noise ratio EM signals, improving MT technology has become a key focus [
20,
21]. A typical MT data processing algorithm aims to optimize superimposed partial transfer functions or power spectral density matrices across all frequency domain windows. This process involves identifying, eliminating, and weighting outliers in the time and/or frequency domain. However, the measured data often deviate from an ideal Gaussian distribution, rendering traditional least squares methods less effective. Manually labeling of time slices or partial power spectral density matrices is a direct but time-consuming and inefficient method for enhancing superposition. Robust estimations based on various regression algorithms can mitigate the impact of outliers induced by random impulse noise [
22,
23,
24,
25]. However, robust results are challenging to achieve when noise persists throughout the entire observation period. Another effective approach is the remote reference method, which suppresses uncorrelated noise between local and remote stations by utilizing an additional “clean” remote station [
26,
27,
28]. This method is commonly applied using magnetic channels, leveraging strong correlations among regional signals. In cases where array datasets are available, frequency domain principal component analysis is a reliable processing approach using Fourier coefficient vectors from multi-channels across multi-stations [
29,
30,
31,
32]. This method reduces the dimensionality of high-dimensional datasets to lower dimensions, enhancing processing efficiency. However, the multiple-channel algorithm requires specific data quality and quantity criteria to achieve optimal results. The mentioned methods often involve a short-time Fourier transform. Additionally, MT data processing utilizes other transforms and digital signal processing algorithms, such as the wavelet transform [
33], the Hilbert–Huang Transform [
34], Empirical Mode Decomposition [
35], compressive sensing technology [
36], Wiener filtering [
37], independent component analysis [
38], blind source separation method [
32], and among others. These methods demonstrate effectiveness in improving processing outcomes when dealing with typically distributed noise, provided that specific parameters are appropriately defined. However, they can be challenging to apply effectively in scenarios involving complex and strong noise.
Processing initial time series data, including noise recognition, removal, and signal reconstruction, represents direct and effective approaches to enhance the signal-to-noise ratio. In recent years, owning to continuous advancements in computer technology, deep learning algorithms have provided unique advantages in signal analysis and data processing, becoming crucial tools in geoscience [
39]. Theses algorithms have been successfully applied to optimize and process EM field data. Manoj and Nagarajan [
40] utilized artificial neural networks to identify and remove noise segments from EM data. This method requires manual identification and labeling of features, such as data quality and signal correlation, in the training dataset. Li et al. [
41] employed a convolutional neural network (CNN) to construct a nonlinear mapping model between noisy data and noise contours. Li et al. [
42] proposed a denoising algorithm combining CNNs and long-short term memory (LSTM) recurrent neural networks. Initially, signal-to-noise separation is conducted using CNNs, and then the LSTM model is trained with denoised data from the CNNs to predict clean data from noisy signals. However, significant misfits may occur when continuous noise exists in time series or when the length of noisy data is too long. Han et al. [
43] developed a noise suppression method using LSTM networks, and the reconstructed data were employed in 2D inversion modeling, effectively suppressing strong noise with typical morphological features. However, lower accuracy can result when noisy and clean datasets exhibit similar morphological features. Most typical deep learning-based noise suppression methods are designed for single-channel data. Zhang et al. [
44] proposed a multi-channel approach using CNNs and applied the method in audio magnetotelluric data processing, although longer period data remain ambiguous. The efficacy of supervised learning-based methods depends on available training datasets that encompass the principal signal and noise morphological features. Additionally, unsupervised data-driven deep learning techniques based on sparse coding algorithms are employed in noise identification, separation, and suppression of EM data, such as K-SVD dictionary learning [
45,
46,
47], improved shift-invariant sparse coding [
48], and adaptive sparse representation [
49], among others. However, appropriately defining specified coefficients is essential for these methods, often requiring multiple experiments to achieve optimal results. Furthermore, two key dataset conditions are necessary to ensure efficacy: a sufficient high signal-to-noise ratio and strong regularity characteristics [
45].
Robust training models play crucial roles in ensuring successful time series denoising outcomes. Generally, supervised learning algorithms use labeled training datasets to generalize training models [
50], and having a rich and representative training dataset is crucial to ensure good generalization ability of the model in practical applications. Traditionally, typical EM noises such as charge and discharge triangle waves, square waves, peak noise, and step noise are superimposed on clean measured and/or theoretical data to construct the training datasets [
41,
42,
43,
44]. However, constructing sufficient sample libraries that encompass various complex noise features of practical data can be challenging. Although data-driven algorithms such as dictionary learning based on sparse coding [
45,
46,
47,
48,
49] can be used to obtain atoms that match the target signal directly from the observed data, the applicability of such methods when faced with persistent and strong noise interference still requires further investigation.
In this study, we propose a signal reconstruction method based on LSTM neural networks using underground EM observation data. The method eliminates the need to pre-construct a database comprising a substantial number of synthesized signals and noises. Instead, it utilizes clean and stable multi-channel data from underground observations as the model training set in a data-driven manner. This approach allows the method to fully learn regional EM field variations and characteristics. Subsequently, noise suppression or “cleaning” is achieved through the process of signal reconstruction using noisy segments from synchronously observed ground data.
3. Synthetic Experiments
Four types of simulated noise, including charge and discharge triangle waves, Gaussian noise, square waves, and peak noise, were randomly superimposed onto the synthetic time series with a sampling rate of 1000 Hz and a length of 10 s.
Figure 4 illustrates the reconstruction results from the time series contaminated with these different types of noise. The reconstructed data effectively suppress the four typical types of EM noise, exhibiting a high degree of agreement between the reconstructed and original clean data in terms of waveform and amplitude.
To further assess the robustness of the reconstruction technique, the four types of noise were randomly combined and added to the original time series.
Figure 5 illustrates the data reconstruction results for the timeseries contaminated with this complex noise combination. The reconstructed data shows a good agreement with the original clean data in terms of amplitude range, waveform shape, and developmental trend, even under the influence of complex random noise. Minor differences observed in the reconstructed data are considered acceptable, given the significant distortion characteristics associated with noise-containing data.
The mean absolute percentage error (MAPE, Equation (11)) and the symmetric mean absolute percentage error (SMAPE, Equation (12)) were employed as evaluation metrics to assess the denoising effect,
where, n represents the number of samples, y denotes the original data, and y ̅ indicates the reconstructed data. MAPE measures the mean percentage error between the original and reconstructed data, while SMAPE incorporates the relative error into the assessment. Lower values of both MAPE and SMAPE indicate better denoising performance.
Quantitative evaluation results for the reconstructed data depicted in
Figure 4 and
Figure 5 are summarized in
Table 1. The findings indicate that for the four typical single noise datasets, the MAPE is approximately 2%, with the SMAPE less than 1%. Additionally, for the randomly combined noise datasets, the MAPE is only 3.85%, with an SMAPE of 1.09%. These results indicate that the LSTM network model effectively suppresses noise components, whether dealing with single-type noise data or complex multi-noise interference, highlighting the method’s efficacy and reliability.
5. Discussion and Conclusions
This study presents an EM data reconstruction method utilizing LSTM recurrent neural networks, trained on data observed in deep underground environments known for their low-background noise advantages. The LSTM network model is specifically tailored to capture the complex coupling relationships and dynamic characteristics inherent in multi-channel EM field components. The method focuses on reconstructing strong noise segments synchronously observed at noisy stations, particularly typical ground stations. Key characteristics of this approach include:
(1) Typical MT noise suppression methods utilizing CNNs or RNNs are primarily supervised learning algorithms. The effectiveness and robustness of both model training and denoising processing hinge on the availability of diverse and extensive training datasets encompassing both signals and noises. Training datasets often include synthetic data representing typical types of noise and/or high-quality measured signals, aiding in noise identification, separation, and reconstruction for specific observed signals [
41,
42,
43]. However, EM field signals exhibit strong spatial and weak temporal correlation and are susceptible to contamination from random and complex noise sources. As a result, synthetic data-derived signal and noise patterns may not fully encapsulate the diverse characteristics of observed data, limiting the applicability and reliability of these methods. The inherent complexities and variability of real-world EM field data necessitate novel approaches that can adapt and generalize effectively to diverse noise conditions and signal characteristics.
In this study, both the training dataset and the processing dataset are synchronously observed, indicating strong inherit correlations and coupling between the two datasets within a specific spatial range, particularly with respect to magnetic field components. Unlike approaches that focused solely on characterizing the morphological contours of noise, our method leverages the robust spatial correlation of EM fields and the high-quality underground data. This approach utilizes the intrinsic characteristics of synchronous observations and the coupling between different EM components obtained through deep learning to achieve data reconstruction. The method capitalizes on the strong spatial correlation within the EM field signals and the high signal-to-noise ratio characteristics of underground data. Importantly, this process does not require prior knowledge of noise types, and it reduces the necessary sample size for the training dataset compared to traditional methods, thereby offering advantages in processing efficiency. Moreover, traditional methods typically involve independent training and processing for specific single components. In contrast, our multidimensional model reconstruction approach not only comprehensively learns the spatiotemporal characteristics of individual components but also establishes a robust network mapping based on the couplings among different EM components. This holistic approach enables more effective and efficient data reconstruction while capturing complex interdependencies among various EM field parameters.
(2) The methodology employed in this study shares formal similarities with the remote reference method, as both approaches utilizes reference station data observed in low EM noise environments and leverage the strong spatial correlation exhibited by EM fields in regional areas. However, there are fundamental distinctions between them: Firstly, traditional remote reference methods like those proposed by Gamble et al. [
27] and Clarke et al. [
28] , along with variants such as the remote reference with ‘magnetic control’ (RRMT) introduced by Varentsov [
67] and Varentsov [
68], operate in the frequency domain. These methods capitalize on the lack of noise correction between remote and local channels to suppress noise levels in the power spectral density matrix. In contrast, the method presented in this study utilizes deep learning technology to establish a network mapping for time series reconstruction of noise-affected data segments in the time domain. Secondly, traditional remote reference methods focus on signal correlation between local and remote stations, calculating correlations independently for each component. In contrast, our method considers correlations among all channels from both local and remote stations. Varentsov [
67] introduces the horizontal magnetic tensor into the traditional remote reference method, which also incorporates constraints arising from the correlation between different magnetic field components.
(3) In recent years, deep learning has found widespread application in the analysis of seismic data [
69,
70], gravity data [
71], and various other geophysical field observations. These applications have demonstrated highly effective processing and denoising capabilities for time series data. The LSTM network model training and signal reconstruction method proposed in this study, which is based on underground EM observation data, can be extended to encompass other deep underground observation experiments involving multiple geophysical fields. This extension would entail adapting the model training and processing for individual geophysical fields or multiple coupled geophysical fields, thereby serving as a preprocessing tool to capture weak signals and address scientific challenges. It’s important to recognize that time series data from different geophysical field observations exhibit diverse patterns and are subject to varying types of noise. Consequently, adjustments to model parameters will be necessary when applying this method to process other multi-geophysical field data. Each field may require tailored approaches to optimize performance and achieve accurate signal reconstruction and denoising effects.
(4) The complex noise in the reconstructed data is effectively suppressed with the LSTM network model, benefiting from the attenuation to high-frequency interference noise when penetrating through the conductive cover layer. However, this attenuation also leads to differences in frequency components between the training data and the target reconstructed data, resulting in an amplitude loss of high-frequency information in the reconstructed signal. Therefore, when targeting high-frequency information in processing, the method is no longer directly applicable, and further amplitude correction of the effective signal becomes necessary.
Figure 1.
Recurrent neural networks (RNNs) and unfolding in time.
Figure 1.
Recurrent neural networks (RNNs) and unfolding in time.
Figure 2.
Structure of Long Short-Term Memory unit.
Figure 2.
Structure of Long Short-Term Memory unit.
Figure 3.
Flowchart of electromagnetic data reconstruction using LSTM neural network based on underground observations.
Figure 3.
Flowchart of electromagnetic data reconstruction using LSTM neural network based on underground observations.
Figure 4.
Data reconstruction from clean time series superimposed with different types of noise: (a) Charge and discharge triangle waves; (b) Square waves; (c) Gaussian noise; (d) Peak noise. Red lines represent the noisy data, blue lines represent the clean data, and black lines represent the reconstructed data.
Figure 4.
Data reconstruction from clean time series superimposed with different types of noise: (a) Charge and discharge triangle waves; (b) Square waves; (c) Gaussian noise; (d) Peak noise. Red lines represent the noisy data, blue lines represent the clean data, and black lines represent the reconstructed data.
Figure 5.
Data reconstruction from timeseries superimposed with four types of noise. Red lines represent the noise data, blue lines represent the clean data, and black lines represent the reconstructed data.
Figure 5.
Data reconstruction from timeseries superimposed with four types of noise. Red lines represent the noise data, blue lines represent the clean data, and black lines represent the reconstructed data.
Figure 6.
Reconstruction of ground Hx component using original data. (a) One-minute data; (b) Five-second data. Light orange lines represent the original ground data, orange lines represent the reconstructed data, and black lines represent the underground data.
Figure 6.
Reconstruction of ground Hx component using original data. (a) One-minute data; (b) Five-second data. Light orange lines represent the original ground data, orange lines represent the reconstructed data, and black lines represent the underground data.
Figure 7.
Reconstruction of ground Hx component using 100 Hz low-pass filtered data. (a) One-minute data; (b) Five-second data. Light orange lines represent the ground data, and orange lines represent the reconstructed data.
Figure 7.
Reconstruction of ground Hx component using 100 Hz low-pass filtered data. (a) One-minute data; (b) Five-second data. Light orange lines represent the ground data, and orange lines represent the reconstructed data.
Figure 8.
Reconstruction of ground Hx component using 1 Hz low-pass filtered data. Light orange lines represent the ground data, and orange lines represent the reconstructed data.
Figure 8.
Reconstruction of ground Hx component using 1 Hz low-pass filtered data. Light orange lines represent the ground data, and orange lines represent the reconstructed data.
Figure 9.
Power spectral density of ground and reconstructed data. The ground data used are the original data (a), the 100 Hz low-pass filtered data (b), and the 1 Hz low-pass filtered data (c).
Figure 9.
Power spectral density of ground and reconstructed data. The ground data used are the original data (a), the 100 Hz low-pass filtered data (b), and the 1 Hz low-pass filtered data (c).
Figure 10.
Probability density functions of ground and reconstructed data. The ground data used are the original data (a), the 100 Hz low-pass filtered data (b), and the 1 Hz low-pass filtered data (c).
Figure 10.
Probability density functions of ground and reconstructed data. The ground data used are the original data (a), the 100 Hz low-pass filtered data (b), and the 1 Hz low-pass filtered data (c).
Figure 11.
Results of the continuous wavelet transform of ground data (left panels) and corresponding reconstructed data (right panels). The ground data used are the original data (a), the 100 Hz low-pass filtered data (b), and the 1 Hz low-pass filtered data (c).
Figure 11.
Results of the continuous wavelet transform of ground data (left panels) and corresponding reconstructed data (right panels). The ground data used are the original data (a), the 100 Hz low-pass filtered data (b), and the 1 Hz low-pass filtered data (c).
Table 1.
Evaluation metrics for different noise suppression effects.
Table 1.
Evaluation metrics for different noise suppression effects.
Noise |
MAPE (%) |
SMAPE (%) |
Charge and discharge triangle wave |
1.9222 |
0.6738 |
Square wave |
1.7582 |
0.5977 |
Gaussian noise |
2.0160 |
0.5724 |
Peak noise |
1.6031 |
0.5099 |
Combined noise |
3.8490 |
1.0905 |