5.2. Influencing Factors
As noted above, the features considered in the machine-learning models were initially identified based on a detailed qualitative analysis of the framing workstation as outlined in a previous study [
3]. Interestingly, all of the features with the exception of the scheduled workload feature were found to hold relevant information regarding PT in at least one of the machine-learning models. Notably, the NN model found all of the features, with the exception of the scheduled workload feature, to be significant predictors of PT. However, this does not necessarily imply that scheduled workload does not affect PT. In fact, the data collected for this case study only covered days of operations that resulted in a limited number of observations for this feature. Since any potential effect of workload on PT would be indirect, several months’ worth of data may be necessary to ascertain whether or not it has an impact. Moreover, although the majority of the features were found not to have a statistically significant regression coefficient in the initially developed LR model, a number of them were found to have a potentially explainable effect on PT. The LR model trained using all features is expressed in Eq. (10).
As explained in
Section 3.2, the increase in length, height, and thickness should logically increase PT, and this is reflected in their positive coefficients. The size of these coefficients, however, does not accurately represent the respective independent effect of each on PT, given their high level of multicollinearity with other features (as evidenced by their high VIF values as presented in
Table 3). For instance, the coefficient of the daily sequence feature is near 0, but its slightly negative value could indicate that the worker becomes more “dialed in” to their work as they frame more panels. Moreover, while it was to be expected that the complexity feature would have a positive coefficient since framing a wall panel with a mix of different elements (e.g., an exterior multi-wall with one large door, one regular door, one large window, two regular windows, etc.) is less straightforward than framing a panel consisting of fewer different types of elements (e.g., an interior wall that consists mainly of studs), given its degree of multicollinearity, the size of its coefficient could not be interpreted independently. As for the height difference feature, its coefficient indicates that it takes an additional half-minute for every foot of difference in height, which is a reasonable amount of time to allow for adjusting the width of the framing machine. The negative coefficients of windows, large doors, and preassembled components are reasonable given the large positive coefficient of the length feature. Windows, doors, and preassembled components are preassembled and only need to be nailed to the frame. Since every linear foot per wall panel adds approximately 1.5 min to PT, such preassembled openings should reduce this duration, as they cover several linear feet of wall panel and only require nailing. It is not clear, however, why the coefficient of regular doors was found to be positive.
With regard to the number of cutting zones and number of drilled holes features, the positive signs of their coefficients are reasonable, as each cutting zone and hole requires that additional steps be performed by the machine. The relatively small coefficient of the number of drilled holes feature, however, may be attributable to its collinearity with the length feature (as previously explained). Similarly, since length is correlated with the number of regular studs, it is not surprising that it was found to have a small coefficient, but the reason for its negative sign is not clear. D-studs, L-studs, and M-studs should logically add more time to PT since they require more nails compared to regular studs. While the coefficients of the M-stud and L-stud features align with this logic, that of the D-stud feature is negative. The D-stud feature had the lowest number of records (it was found in just 14 panels, as per
Table 1), and this may explain why the LR model was not able to identify a logical relationship between the D-stud feature and PT. The coefficient of the blocks feature is reasonable, as each block must be manually nailed by the operator. As for the ambient temperature feature, its coefficient was found to be small and a negative value. It is important to note, however, that the range of the recorded temperature data was not particularly wide (−5.5 °C to 12 °C); this data range is not sufficient to test the hypothesis proposed by the operator consulted in this study, which is that workers become tired more and their work pace slows when the ambient temperature exceeds 20 °C since there is no air conditioning in the factory [
3]. The negative coefficient of this feature could be attributable to the fact that the temperature typically increases throughout the day, such that it could have a similar effect to that of the sequence feature on PT. In other words, although the NN model found the ambient temperature feature to contribute to PT, the information it identified in this feature is not necessarily related to the temperature itself. In future work, data from the summer season should be collected in order to examine the effect of high temperatures (i.e., > 20 °C) versus low temperatures on PT.
Regarding the day of the week feature, the negative coefficients corresponding to Tuesdays and Thursdays align with the observation of the operator consulted in this study; the operator mentioned that Tuesdays see a spike in production since workers tend to become “dialed in” after a more sluggish start to the week on Mondays, and that Thursdays tend to be more productive because workers are motivated to complete their work early in order to start their weekend (the case plant does not operate on Fridays) [
3]. Finally, although the shift feature was discussed in a previous study [
3], with fatigue expected to result in an increase in PT in the afternoon, its coefficient was negative in the LR model. Nevertheless, the negative sign of its coefficient is consistent with the negative signs of the sequence and temperature coefficients, the values of which increase throughout the day. These results could also imply that workers are more motivated in the afternoon to finish their work so they can return home, or that they are compelled to increase their pace of work in the afternoon in order to complete the set of wall panels scheduled for the day by the end of the shift.
This discussion demonstrates the significance of expending time and effort on identifying and understanding the factors that influence cycle time prior to developing machine-learning models. When considering only the panel length to predict PT, the MAE obtained for the NN model based on cross-validation results was 2.18 min. Taking into consideration other geometric properties of the panels (i.e., height, width, number of cuts, etc.) was found to reduce this error to 1.94 min, representing an 11% reduction in the error. Moreover, taking into consideration the complexity, day, shift, ambient temperature, height difference, framing sequence, and date features was found to further reduce the error to 1.80 min, representing a total error reduction of 17%. As these findings suggest, gaining understanding as to what factors are influencing cycle time helps to improve the accuracy of process time estimation systems.
5.3. The Performance of Different Machine-Learning Algorithms
The NN model was found to be the most suitable model for the case workstation, as previously explained. However, the LR model performed nearly as well as the NN model, and was able to reach that performance using only 11 features (compared to 23 features in the case of the NN model). As such, for the modeller who favors simplicity and interpretability, the LR model may be a more attractive choice. Nevertheless, the NN model’s ability to identify relationships between the various features and PT that were not apparent through scatter plots and Spearman’s coefficient and that were not deemed important by the LR and RF models is noteworthy. Moreover, given that the numbers of observations in the training dataset was relatively small for some of these features (e.g., large doors, garage doors, preassembled components), increasing the size of the dataset may further improve the performance of the NN model. It is also noteworthy that the NN model was not sensitive to multicollinearity, unlike the LR model, which frequently changed its regression coefficients and improved with each statistically insignificant and dependent feature removed from the model. As for the RF model, it generally showed inferior performance compared to the NN and LR models. Perhaps the most significant observation concerning the RF model was that it performed better after removing all the features that had low frequencies in the training dataset (with the exception of the block feature, which was found to be important given its actual significant effect on PT, since installation of blocking involves time-consuming manual work). Hence, significantly increasing the size of the dataset to include more panels containing the less frequent framing elements (e.g., preassembled components) may result in more of the relevant features being retained in the RF model and, in turn, in improved performance of the RF model.
It can be concluded that the performance of the models was dependent on the specific workstation studied, given that a good number of features were found to be correlated, and that certain features were found to be significantly less frequent than others. As such, models other the NN model may be more suitable to represent other workstations on the same production line, depending on the factors influencing the corresponding PT and on the dataset used for training. In other words, selecting a machine-learning model for the estimation system in the case of other workstations should not rely solely on the results of this study, whereas an independent analysis is required for each workstation by virtue of each having unique characteristics.