2.3. Two-Dimensional Extended ROS
The schematic diagram of the ROS proposed in this paper is shown in
Figure 2. For convenience, for the process data,
, the variable direction is not shown in the figure and each circle represent a sample point
. The previous neighbors of
, which may have an impact on it, including not only the past points in the time direction,
,
,…,
, but also the past points in the batch direction,
,
,…,
, the past points in both the time direction and the batch direction,
,
,…,
,
,…,
,…,
, and even the past points of the past batches but with the time indexes larger than
k,
,
,…,
,
,…,
,…,
,…,
. The region covering the above points which may influence the contribution of the current point to the final quality is defined as the novel ROS. Thus, these four parts of data are called the
k-extended region of support (KROS),
i-extended region of support (IROS),
k-
i-extended region of support (KIROS) and back-extended region of support (BROS). Their specific locations are shown in
Figure 2. The novel ROS,
k-
i-back-extended region of support (KIBROS), include the above four ROSs.
In the cited paper [
24], the ROS was defined as the data before the time point of this batch and previous batches, which including
k-extended and
k-
i-extended ROSs in
Figure 2. While, in this paper, ROS is redefined and extended to four sub-regions including all possible regions along the time direction and the batch direction. Correspondingly, the observed element
will be extended in the four specific ROSs.
In the previous batch process research, the variable data of the
k-th sampling time in each batch are formed into a time-slice matrix,
, and used to find the relationship between the sampling process data and the quality data, and then all time-slice matrices are used to obtain the final prediction quality [
25]. In this work, all samples in ROSs are considered at the same time while modeling for each sampling point, thus the established model will contain the corresponding information, and the prediction accuracy can be further improved. Based on the above data in ROSs, a two-dimensional extended matrix is proposed, whose specific description is as follows.
Step 1. Extension of samples in the k-extended region of support (KROS).
The past points in the time direction, that is, the samples in KROS,
,
,…,
, are considered to construct the extended ROS in the time direction. And the extended process data matrix can be obtained as below, while the quality data vector remains since it represents the final quality for the whole batch. It should be noticed that for the final quality which can only be obtained at the end of each batch process, the
k index is not necessary.
where
represents the total number of sampling times included, in other words, a total
of sampling times are extended forward.
Step 2. Extension of samples in the i-extended region of support (IROS).
Next is the way to build the extended ROS in the batch direction. For the variable vector of the
i-th batch at the
k-th sampling time,
, the variable vector of the (
i-1)-th batch,
is spliced at the back of
, and the variable vector of (
i-2)-th, (
i-3)-th, ..., (
i-
N+1)-th batch are spliced behind the matrix in turn, where
N is the number of batches included. The extended process variable matrix is shown below, and similarly, the quality matrix is established.
Step 3. Extension of samples in the k-i-extended region of support (KIROS).
Combining the KROS and the IROS proposed before, the method proposed next can further realize the construction of a two-dimensional extended matrix based on the time-batch evolution information.
First, for the i-th batch, the data are extended in the direction of sampling time according to the method mentioned in Step 1 to obtain a matrix .
Then, the
i-th to the (
i-
N+1)-th batches in the IROS are extended according to the time direction, and the following process data extended matrix is obtained, as well as the quality data extended matrix:
Step 4. Extension of samples in the back-extended region of support (BROS).
Since in the above work, only the data before the
k-th sample time is adopted, it is necessary to further consider the data information after the
k-th sample time to further improve the accuracy of quality prediction. The construction method of the backward extended matrix is symmetric with the two-dimensional extended matrix based on the time-batch evolution information proposed in the previous step. In addition, since the information of the current
i-th batch after the
k-th sample time is unable to be obtained during modeling, considering the similarity of processes between batches in the batch process, the data after the
k-th time of the (
i-1)-th batch is used as an alternative, then the corresponding position of the current batch is filled. A schematic diagram of the specific process is shown in
Figure 3.
A backward extended process data matrix and a backward extended quality matrix can be constructed using the above-mentioned method in the direction of batch and sampling time as below:
where
B is the number of backward samples included.
Step 5. Extension of samples in the k-i-back-extended region of support (KIBROS).
The sampled data in the whole range of ROS are used, and a method for constructing a
k-
i-back-extended matrix based on time-batch evolution information is proposed next. For the process data at the
k-th sampling time in the
i-th batch currently considered,
, first, it is extended in the direction of the sampling time to obtain
, which is expressed as follows, as well as the quality data:
The parameters that determine the size of the expanded matrix are F, N and B as mentioned before. In actual construction, three parameters can be determined one by one to achieve the best prediction effect.
As introduced before, the quality prediction model will be built between the time-slice data and the quality data . In this work, each element in the time-slice data , , will be extended by the proposed , , , , and , according to the proposed new ROSs, obtaining the extended time-slice data . Similarly, each element in the quality data , , will be extended by the proposed , , , , and , obtaining the extended quality data . Subsequently, the quality perdition model will be built.
2.4. Quality Prediction Method and Evaluation Indicators
After the construction of the extended data matrix within each sliding window is given, PLS is used to finish the quality prediction. As mentioned before, the PLS model based on the extended window is expressed as follows:
where
represents the extended time-slice data matrices,
represents the extended quality data matrices, where
c represent the
c-th phase for multi-phase processes. The regression model is as follows:
For the batch whose quality to be predicted online, the process variable matrix is denoted as
. The extended matrix of this test batch is established using the method proposed in
Section 2.3, donated as
, and it is substituted into the following formula to obtain the prediction quality.
A few indicators are used to evaluate the performance of prediction methods based on the model. The prediction accuracy,
, of the quality prediction model of the
k-th sampling time within the
c-th phase is as follows:
where
is the measured value of the quality variable of the
i-th batch,
is the predicted value of the one of quality variables of the
k-th sampling time within the
c-th phase and
is the average value of the quality variable measured at the end of the
i-th batch. The value range of
is 0-1. If
approaches 1, it indicates that the precision of the quality prediction model is high. While if
approaches 0, it means that the change in this phase cannot explain the change of quality well.
Furthermore,
is proposed to indicate the average impact of the
c-th phase:
where
and
represent the starting and last sampling points in the
c-th phase, separately. In this paper, the offline quality analysis will be carried out based on the comparison of
and
.
Root mean square error (RMSE) is used to verify the prediction accuracy of the model, which is expressed by the following formula:
where
r is the number of measurement groups and
is the deviation between one group of the predicted values and the real value of quality. As can be seen from the formula, the smaller the RMSE, the more accurate the prediction results. The comparison of the online prediction precision of each method is based on the evaluation of RMSE.