4.3. Degradation Monitoring
This section provides an overview of the methodology and approach utilized to monitor degradation in the viscose fiber production process as described in
Section 4.2.
Figure 1 illustrates the approach employed for monitoring the degradation of the process over time.
As previously discussed in
Section 4.1, the production process comprises two distinct phases: the filtration and the rejection phases. The duration of the rejection cycle remained constant at
for each filter group which was used for further analysis. However, the duration of the filtration cycle exhibited variability based on factors such as the sieve’s condition (whether it was new or old), amount of material blocking the sieve, the differential pressure etc. as already discussed in
Section 4.1. To accommodate this variability, the average duration of the filtration cycle over one month following a sieve change was computed. The calculated average duration was found to be
and was used for further analysis. This information was needed for incorporating lags per feature as additional features within the dataset to obtain the causal graphs and is described below.
Sensors Data Preprocessing: The data obtained from the sensors, depicted in
Figure 6, undergo pre-processing steps, as visualized in
Figure 7. Firstly the dataset was divided into two phases based on the respective times of filtration and rejection, as shown in the Data Segmentation part of
Figure 7.
To address the irregular sampling frequency inherent in the rejection and filtration phases, we used data resampling techniques, as depicted in
Figure 7. Specifically, the rejection phase data underwent resampling at a rate of 1 second, while the filtration phase data was resampled at a rate of 7 seconds. These resampling rates were determined based on recommendations from domain experts, ensuring alignment with the desired precision level for the analysis, particularly concerning the dynamic behavior of the process. This selection reflects the understanding that the dynamics of the process in the rejection phase exhibit faster variations compared to those in the filtration phase. For the sake of readability, here we focus on the rejection phase and the results for the filtration phase can be found in
Appendix.
Causal Discovery: At a frequency of
, the rejection group data was obtained after completing the preprocessing step. Subsequently, this data was partitioned on a monthly basis, further dividing each month into four distinct weeks as shown in
Figure 8. This segmentation strategy was implemented to facilitate the monitoring of degradation in the viscose fiber production process on a weekly basis. The decision to operate on a weekly frequency was motivated by the computational cost and time-consuming nature of causal graph computation. The computation complexity of the causal graphs using FCI is discussed below. Daily monitoring was deemed impractical, while monthly intervals were considered too infrequent, risking potential losses in the efficiency of the entire viscose fiber production system. As a result, the weekly basis provided a balanced and effective approach for a timely degradation assessment.
To monitor deterioration, the dataset described in
Section 4.2 and
Table 1, comprising 7 features, was utilized. Furthermore, 40 lags per feature were included as additional features, where a lag represents the time delay between consecutive observations, indicating the temporal relationship between a variable and its past values as described in
Section 3.2. To adapt FCI for time series data, additional features in the form of lags were introduced as described in
Section 3.2. These lags serve as supplementary variables, facilitating the integration of temporal information into the causal discovery process. This modification allows FCI to account for the temporal dependencies present in time series data and uncover causal relationships that extend across different time points. The choice of the number of lags was influenced by the total duration of the rejection phase (
), along with its respective sampling frequencies (
), to ensure coverage of the entire duration of the rejection phase in the construction of the corresponding causal graphs. To ensure comparability between results for the rejection and filtration phases, domain experts recommended using the same number of lags for both phases. Consequently, by considering a total lag of 40 and a sampling frequency of 7 seconds, we covered almost the entire duration of the filtration phase (approximately
) in constructing the corresponding causal graphs. This harmonization of lag features enables consistent analysis across both phases of the production process.
Therefore, the total number of features required to construct causal graphs for both the rejection and filtration phases amounted to features for each time point. With such a large number of features (287) per causal graph, and also considering the computational complexity of FCI, only two days of data were considered to represent the entire week. Each week, we used data from the first two successive complete days to create causal graphs for both the rejection and filtration phases. This resulted in around 19,000 samples with 287 features each. Constructing these graphs with FCI took approximately 6 hours. During the creation of causal graphs using the FCI causal discovery method, domain knowledge emphasizing the principle that future or present events cannot influence past events was incorporated. This integration ensured that the causal graphs accurately reflected the causal relationships inherent in the dynamic production process.
Causal Graphs and Reference Causal Graph: With the approach mentioned above, a total of 19 causal graphs were generated, each representing a specific week of each month from August (after the sieve was changed) to December 2022 as shown in the Causal Graphs Stage in
Figure 9.
To effectively monitor the degradation of the process over time, a reference graph was pivotal. This reference graph would represent the normal operating scenario when the system functions as expected by the domain experts. The selection of such a reference graph is crucial for an accurate comparison of the graphs generated for consecutive weeks.
The criteria for choosing the reference graph involved selecting a graph that is close to the date when the sieve was changed and exhibits similarity to other causal graphs for the remaining weeks and months. The similarity between the graphs was quantified using the Jaccard similarity explained in
Section 3.3, where a score of 0 indicates complete dissimilarity, and a score of 1 signifies identical graphs. The Jaccard similarity score was calculated while considering the direction of the edges between features, as FCI generates different types of edges as shown in
Figure 2.
A heatmap depicting the Jaccard similarity score for different combinations of reference graphs during the rejection phase is presented on the left-hand side of
Figure 10. This figure illustrates the computation of Jaccard similarity scores for various combinations of graphs used as reference graphs. The iterative process entails selecting one graph from all causal graphs as a reference and evaluating its similarity against all other graphs to identify the one exhibiting the highest resemblance to the others. In particular, this comparison excludes self-referencing (i.e., a graph is not compared against itself), and comparisons with graphs occurring before the reference, are excluded to focus solely on monitoring degradation from the optimal state. Consequently, the heatmap is configured with only
entries, where
, corresponding to the total number of causal graphs.
On the right side of
Figure 10, boxplots depict the distribution of Jaccard similarity scores when individual graphs are considered as the reference and compared with others. The choice of the reference graph aims to find one close to the date of the sieve change with a higher median and lower variance in Jaccard similarity scores, as shown in the right-hand side of
Figure 10. This selection process is crucial as the reference graph should represent the ideal operating condition and be highly similar to other graphs, given that degradation is a gradual process. A higher median ensures greater similarity between the reference graph and others, reflecting the desired operational state. Meanwhile, lower variance indicates less significant variation among graphs, aligning with the gradual nature of degradation.
Among the examined boxplots, the graph from 09-11th August 2022, highlighted in purple, demonstrates the highest median and proximity to the sieve change date. Although the graph from 14-16 August 2022 also aligns closely with the sieve change date and exhibits similar variance in Jaccard similarity scores, it possesses a lower median compared to the one from 09-11 August. Consequently, the graph from 09-11 August was selected as the reference for further analysis. This decision ensures that the chosen reference graph effectively captures the optimal operating condition while maintaining consistency with the observed data dynamics.
Graph comparison: Once the reference graph was chosen, a comparative analysis was conducted with graphs over preceding time intervals using Jaccard distance, as illustrated in Graph Comparison Stage in
Figure 9. The selection of the Jaccard distance as the comparison measure, instead of the Jaccard similarity score, was driven by the need to quantify the differences in causal graphs over time, as detailed in
Section 3.3. These differences in causal graphs stem from variations in the dynamics of the sieve due to its degradation or deterioration during its operational span.
Figure 11 visually presents the comparison between causal graphs and the reference graph (chosen to be the one on 09-11th August) using Jaccard distance for the rejection phase. Given the dynamic nature of the process, susceptible to variations over time, a trend analysis was performed after computing the Jaccard difference score to monitor degradation in the production process. The observed positive trend indicates an increase in degradation over time following the change in the sieve.
Interpretability: Our approach not only facilitates the continuous monitoring of degradation in the viscose fiber production process but also empowers domain experts to integrate their knowledge into the creation and interpretation of causal graphs. As shown in
Figure 12, this section focuses on interpreting the observed variations in the dynamics of the production process during degradation monitoring, employing two distinct methods.
Visual Inspection of Causal Graphs for Root Cause Analysis: The initial method involves visually examining causal graphs to discern changes at specific time points. By setting a degradation threshold for the Jaccard distance, as demonstrated in
Figure 11, domain experts can scrutinize changes and analyze the causal graph of the ongoing production process.
For example, considering the maximum Jaccard distance on 01-03 October from
Figure 11, a comparison between the causal graphs for the reference graph (09-11 August) and this date (01-03 October) is performed.
Figure 13 (a) and (c) showcase the aggregate causal graphs for the reference graph (09-11 August) and 01-03 October, respectively. The complete causal graph is inherently dense, featuring 40 lags per feature. Due to the repetition of edges between feature pairs over time, the simplified causal graph is presented to emphasize connections between features over a single lag. The edge connectivity between feature pairs or nodes repeats as the graph unfolds in time, and thus, only the unique patterns are illustrated in
Figure 13 (a) and (c).
Upon thorough analysis, several notable changes emerge, particularly evident in the causal graph on 01-03 October depicted in
Figure 13(c) compared to the reference graph on 09-11 August shown in
Figure 13(a). One significant observation is the introduction of latent confounders in the causal graph on 01-03 October, absent in the reference graph. An in-depth examination of the subset graph for both dates, focusing on features
p1 and
pdiff in
Figure 13(b) and (d), reveals the emergence of a latent confounder influencing their relationship in the causal graph on 01-03 October, whereas it was absent in the reference causal graph on 09-11 August. This relationship holds crucial significance as it triggers the initiation of the rejection and filtration phases, making the introduction of a latent confounder a critical observation.
The differential pressure (pdiff) signifies the disparity between the input pressure (p1) and the constant output pressure (p2). Thus, variations in p1 directly impact pdiff, given the constant nature of p2. When pdiff exceeds a certain threshold, rejection initiates; otherwise, filtration continues. However, the introduction of a latent confounder enables false switching between the rejection and filtration phases, impacting output quality in multiple ways. Firstly, an increased number of filtrations and fewer rejections may indicate insufficient space within the sieve for new waste particles, leading to clogging and reducing the lifespan of the sieve and degradaing the output quality. Alternatively, excessive rejections may result in more frequent motor contact with the sieve during cleaning or backwashing, accelerating mechanical degradation and shortening the sieve’s lifespan, subsequently diminishing output quality. This observation underscores the importance of identifying and addressing latent confounders to maintain process integrity and ensure optimal output quality.
Further examination shows a delayed connection between
p1 and
pdiff in the reference graph (
Figure 13 (b)) that is absent in the causal graph for 01-03 October (
Figure 13 (d)). Visual inspection thus provides domain experts with valuable insights into changes in feature relationships, thereby providing a basis or an initial point for the future analysis.
The latent confounders not only exist between features
p1 and
pdiff but also extend to include the features
p1 and
p2. Additionally, new connections emerge in the causal graph for 01-03 October, as depicted in
Figure 13 (c), which are not present in the reference causal graph shown in
Figure 13 (a). This comprehensive analysis provides domain experts with a more profound insight into the evolving dynamics of the process.
Monitoring Changes in Feature Relations Over Time: The second approach involves monitoring changes in the relationship between specific pairs of desired features over time. As previously mentioned, the connections between features p1 and pdiff play a crucial role in initiating the rejection and filtration phases. Therefore, observing the dynamics of these features over time can provide valuable insights before a significant event occurs.
The proposed visualization in
Figure 14 provides an insightful depiction of the monitoring process over time. Notably, between 09-11 August and 07-09 September, no confounders or latent variables are observed between features
p1 and
pdiff, as indicated in the corresponding heatmaps. However, a crucial development occurs from 14-16 September, highlighted in orange on the heatmaps, signaling the appearance of latent confounders in the causal graph. This identification empowers domain experts with the knowledge of when these confounders emerged, enabling focused root cause analysis during this timeframe to discern the underlying causes of such occurrences. Armed with this information, experts can strategize how to maintain process dynamics to meet required specifications.
Moreover, the visualization serves to highlight any new connections or confounders compared to previous causal graphs. This functionality allows domain experts to swiftly detect irregularities in process dynamics while considering the ideal operating scenario derived from the reference causal graph. By leveraging this visualization, experts can proactively address deviations from optimal process conditions, ensuring consistent performance and quality output.
Upon examination of
Figure 14, a noticeable trend emerges wherein potential confounders appear to proliferate over time. This trend underscores the significance of ongoing monitoring, enabling domain experts to discern abnormal behavior and initiate deeper investigations. By dynamically tracking these changes, valuable insights into the evolving relationships between features are gleaned, facilitating the early detection of anomalies or shifts in the production process dynamics. The inherent advantage of leveraging causal graphs lies in providing domain experts with targeted insights: from these heatmaps, experts discern which causal graph to scrutinize, subsequently gaining clarity on the underlying reasons for observed changes and informing their investigative focus to uphold output quality standards.