2.1. Mathematical Formulation of the Algorithm
For digital signal processing systems, the input of the system is a one-dimensional image, and a one-dimensional image can be abstracted as a binary function, denoted as . Here, represents the pixel position, and represents the signal intensity at that position. After passing through an image processing system, the output is still a function. Therefore, image processing is essentially an operation on a function. The input to this processing system is a binary function, and the output is also a binary function.
The spatial domain method involves direct operations on pixels in an image. If we denote the output function as , at position , the value of is highly correlated with the pixels at the corresponding position in the input image and its neighborhood. The mathematical model for the spatial domain method is shown in Equation (1).
Research manuscripts reporting large datasets that are deposited in a publicly available database should specify where the data have been deposited and provide the relevant accession numbers. If the accession numbers have not yet been obtained at the time of submission, please state that they will be provided during review. They must be provided prior to publication.
Interventionary studies involving animals or humans, and other studies that require ethical approval, must list the authority that provided approval and the corresponding ethical approval code.
In Equation (1), T: a signal processing system, : the pixel position, : the signal intensity at position (x, y) (the magnitude of the input), : the signal output intensity at position .
For a signal with the following characteristics: continuous gradient ascent and descent segments are considered as valid signals or feature-extracted signals; sporadic or short-duration gradient ascent and descent segments are considered as interference signals or non-feature signals; non-feature extraction signals may exhibit directional interference. It is required that the output signal retains the trend and amplitude of the feature-extracted signal, preserving the shape corresponding to the raw signal. The algorithm model we have established addresses the above issues, as shown in Equation (2).
In Equation (2), : the amplitude of the th input signal, : the index of the th input signal, : the amplitude of the th output signal, : a weight power, .
2.2. Meaning and Purpose
To facilitate the explanation of the meaning and purpose of the algorithm, we present Equation (2) in a logical form, as shown in Equation (3).
In the Equation (3), :,:,:.
We view the output function of Equation (2) as a superposition of two function results. One is the function
, and the other is the function
. The discrete form of function
is shown in Equation (4).
In Equation (4), : the value of the th output signal, : the weight power, .
In Equation (4), corresponding to function , the operation involves self-repetition, indicating a recursive algorithm based on the recursive principle. It influences the current output based on the previous output, and when , the signal is attenuated. Since the output is a part of the previous output, when the signal undergoes significant changes in adjacent time periods, the output signal remains relatively smooth. When approaches 1, the smoothing effect of the filter becomes more pronounced. With closer to 1, the signal changes more smoothly as it depends more on the previous output. A larger value can better suppress signal fluctuations, reduce the impact of noise, and stabilize the signal. When is a fixed value, meaning the output is a fixed proportion of the previous output, it implies that the current output is a part of the previous output, and the proportion is fixed. This helps reduce signal fluctuations and noise. Therefore, in Equation (2), we provide a more effective range for .
The discrete form of
is shown in Equation (5).
:The th predicted value,:The th actual observed value.
Equation (5), which represents function , is used to maintain the continuous trend. When the input prediction signal exhibits continuous changes, whether ascending or descending, Equation (5) reflects this change, which is added to the output sequence. This ensures that the output signal maintains consistency with the continuous trend of the input signal. If the input signal is continuously increasing, the output sequence will also continue to rise. Due to the influence of the first part of the function , the output sequence smoothly follows the overall trend of the input signal, while the function part retains the trend of the input signal. Therefore, if the input signal is continuously decreasing, the output sequence will also continue to decrease. The waveform shape of the output aligns with the waveform shape of the input signal, which is the result of the combined effects of both and components.
2.3. Parameter Settings based on Difference Analysis
Parameter settings are crucial for the algorithm model. In this study, we further determined the value of the weight constant based on the empirical range of the smoothing filter algorithm weight constant and the design of each part of this algorithm model, using the method of difference testing analysis.
The purpose of the algorithm model established in this study is to preserve the main features of valid signals while smoothing and attenuating noise and interference signals, thereby highlighting the primary characteristics of the observations and reducing the amplitude of noise signals. Therefore, when setting parameters, we need to consider both aspects, aiming to reduce interference signals without affecting the main feature signals. Based on the above analysis, we provide the train of thought for parameter settings, as shown in
Figure 2.
As shown in
Figure 2, our algorithm serves two purposes simultaneously: preserving the trend of the raw data and reducing noise, smoothing the data to minimize interference with subsequent signal processing. Therefore, for signals with small amplitude fluctuations, smoothing and noise reduction are necessary, while the shape of the main feature signals needs to be retained. The setting of the parameter α plays a crucial role in achieving these two objectives. In various existing preprocessing algorithms, parameter settings are often based on empirical values. In contrast to previous studies, we determine the parameter
using a method based on statistical difference analysis to enhance the stability and adaptability of the model.
By conducting difference testing between the observed (raw) data and the predicted (output after model processing) data, we obtain the differences between the two sets of data. In statistics, if the p (Significance) of the two datasets is not less than 0.05, it is considered that their difference is not significant. In the context of our algorithm’s functionality, this indicates that the trend of the main data features has been preserved, confirming a successful setting for . For datasets where the differences are significant, it is due to the changes in non-feature signal segments after weakening, leading to a significant difference in the analysis. To avoid the interference of these signal segments on parameter setting, we need to perform a regression setting on these signals. Subsequently, based on difference analysis, we reset the parameter α by setting these signals to the corresponding data values after filtering, reducing their impact on the setting of . This process is iterated by refining the observed data and predicted data through difference testing, adjusting the value of with an accuracy of 0.01 until the p satisfies the statistical condition (p>=0.05), thus achieving the setting for parameter .
As a decay factor, the parameter typically ranges between (0,1). When , the absolute difference between the current data point and the previous data point contributes less to the result, indicating a smoother model that focuses more on past data. Conversely, as , the model becomes more flexible, placing greater emphasis on recent changes. For algorithm models requiring both fast responsiveness to data and high stability, or when the data itself exhibits a high degree of autocorrelation, higher values can better capture the dynamic characteristics of the data. Here, we limit to the range between (0.9, 1), with an accuracy of 0.01. This ensures that the model can react strongly to changes in the most recent data points without overly relying on them, while still retaining some influence from historical data. This approach helps maintain relative stability when processing rapidly changing data, striking a better balance between the weight of historical and recent data.
Based on the analysis above, we need to preliminarily set
within this range. In this study, we set
to 0.94 initially. We then conduct a difference test between the observed and predicted values using this preset
. The steps for the difference test are outlined in
Figure 3. Since the observed values (raw data) and predicted values (model output data) are paired samples, we need to use a paired sample test to examine the differences between these two sets of data. By calculating the differences between the observed and predicted values and denoting them as d, we assess whether d follows a normal distribution. As d does not exhibit normal distribution characteristics, we use a paired sample Wilcoxon test to compare the two sets of data.
We observe the p of the test result to determine if it is greater than 0.05. If the p is greater than 0.05, it indicates that our parameter setting has been successful. This means that we have effectively smoothed out interference segments with noise and fluctuations while preserving the original feature signals. If the p is not greater than 0.05, it is likely due to changes in the interference signals affecting the test results. In such a case, normalization of the observed data is required.
This involves two steps: Resetting negative values in the observed data. Since the output signal from the algorithm model has already removed sign interference, negative values in the predicted data are set to the corresponding data values in the output signal. Resetting signals near the equilibrium position in the observed data, such as signals close to the zero line, to the corresponding predicted values. This helps reduce the impact of differences caused by the smoothing of signals near the equilibrium position after model processing.
Finally, the reset observed signals are paired with the original predicted signals for a paired sample Wilcoxon test. We iteratively adjust the value of with an accuracy of 0.01 until the p is greater than 0.05. At this point, the current value is selected as the coefficient for the algorithm, indicating a successful parameter setting.
Taking a single clinical ECG data sample from Lead II with a sampling frequency of 500Hz and a duration of 10 seconds as the observed data
, sourced from Zhongshan Hospital affiliated to Fudan University using the Inno-12-U ECG device), we illustrate the difference testing based on the parameter settings. Initially, we set
as a preset value. Subsequently, after running the algorithm model, we obtain the predicted values
. We calculate the difference d between
and
, and perform a normality test on data d. The results are shown in
Table 1, where the significance Sig. (value p) = 0.000 < 0.05, indicating that data d is non-normally distributed.
Since the data is non-normally distributed, we need to perform a paired sample Wilcoxon test on the two sets of data,
and
. The test results are shown in
Table 2. As the significance Sig. (value p) = .000 < 0.05 of the test results for both sets of data, it indicates that
has undergone significant changes due to smoothing and noise reduction, resulting in a significant difference between
and
.
At this point, we cannot determine if the value of the parameter meets the requirement to preserve the main features and trends of . Therefore, we need to reset . In this example, due to significant differences in the amplitude of the characteristic waveforms in the ECG data, we first set the data points in with amplitudes not exceeding 0.63 times the maximum amplitude in the observed data to the corresponding data values in . Let’s designate the reset observed data as . Subsequently, we conduct a Wilcoxon test on and . If the resulting p-value is greater than 0.05, it indicates that there is no significant difference between the data after model processing and the reset observed values, thereby preserving the main characteristics and trends of the observed data.
To further demonstrate the importance of parameter setting and its impact on the results, we repeat the process with
values of 0.93 and 0.95. The test results are shown in
Table 3. Based on the test results in
Table 3, when
, p = 0.084 > 0.05, indicating no significant difference between the predicted data and the corresponding reset observed data. For
and
, p = 0.001 and p = 0.00, both less than 0.01, showing a highly significant difference between the predicted data and the corresponding reset observed data. Therefore, for this type of data, we should set the parameter
to 0.94 when applying this algorithm.