Reverse Engineering of Radical Polymerizations by Multi-objective Optimization

Jelena Fiosina; Philipp Sievers; Gavaskar Kanagaraj; Marco Drache; Sabine Beuermann

doi:10.20944/preprints202402.0199.v1

Submitted:

02 February 2024

Posted:

05 February 2024

You are already at the latest version

Abstract

Reverse engineering is applied to identify optimum polymerization conditions for the synthesis of polymer with pre-defined properties. The proposed approach uses multi-objective optimization (MOO) and provides multiple candidate polymerization procedures to reach the targeted polymer property. The objectives for optimization include maximal similarity of molar mass distributions (MMD) compared to the target MMD, minimal reaction time, and maximal monomer conversion. The method is tested for vinyl acetate radical polymerizations and can be adopted to other monomers. The data for the optimization procedure is generated by an in-house developed kinetic Monte-Carlo (kMC) simulator for a selected recipe search space. The proposed reverse engineering algorithm comprises several steps: kMC simulations for the selected recipe search space to derive initial data, performing MOO for a targeted MMD, and identification of the Pareto optimal space. The last step uses a weighted sum optimization function to calculate the weighted score of each candidate polymerization conditions. To decrease the execution time clustering of the search space based on MMDs is applied. The performance of the proposed approach was tested for various target MMDs. The suggested MOO-based reverse engineering provides multiple recipe candidates depending on competing objectives.

Keywords:

polymerization reverse engineering

;

clustering

;

multi-objective optimization

Subject:

Chemistry and Materials Science - Polymers and Plastics

1. Introduction

Radical polymerizations are known to be very robust and to provide access to a wide range of polymers with largely differing properties, which are defined by the process conditions. The strong correlation between production process and material properties is due to the complex reaction mechanism consisting of a large number of elemental reactions even for homopolymerizations with a single monomer [1]. The kinetics of the elemental reactions are strongly dependent on the process conditions. Therefore, the prediction of a suitable radical polymerization process to obtain a polymer with targeted properties is challenging. To allow for on demand polymer synthesis, at first sight it appears highly attractive to apply simulations of polymerization processes, e.g., employing differential equations [2,3,4] or kinetic Monte-Carlo (kMC) methods [5,6,7,8]. Simulations are particularly valuable, because detailed information on polymer microstructure at each time moment is available, which is not accessible from polymerization processes. However, this type of simulations cannot be run backwards and on demand suggestion of polymerization conditions to obtain a pre-defined polymer is not feasible. To overcome this issue, reverse engineering has the potential to provide several solutions, as opposed to a single one to one relation between polymerization variables and microstructural properties [9]. From the input variables, a polymerization process model predicts the concentration vs. time profiles and the polymer properties. Inverse modeling, on the other hand, is more difficult and calls for optimization strategies. In order to determine the optimal input values for systems with complex reaction mechanisms that provide pre-defined reaction outputs (such as pre-set conversion, yield, and/or other product properties), it was suggested to intelligently explore the reaction condition search space [10]. Further, it was proposed to solve reverse engineering problems using machine learning (ML) based prediction [11], in which ML regression models based on the random forest algorithm and a multivariate and multi-target regression problem [12] were applied. The model took a targeted MMD and predicted the initial polymerization conditions to produce polymer with this targeted property, minimizing the errors in predicted recipes. However, the prediction of the recipe was based on the MMD only, and not optimized with respect to multiple objectives like reaction time or conversion.

Pareto or multi-objective optimization (MOO) [13] is the process of maximizing or minimizing many objective functions while taking a set of constraints into consideration. Numerous scientific domains, such as engineering [14], economics [15], and logistics [16], need the use of MOO when making optimum decisions facing trade-offs between two or more competing objectives. An increasing use of MOO has been seen in chemical engineering [17,18]. In 2009, Fiandaca et al. [19] used genetic algorithm-based MOO to optimize a pressure swing adsorption process based on the maximization of two objectives: nitrogen recovery and nitrogen purity. In 2013, Ganesan et al. [20] carried out MOO of the combined carbon dioxide reforming and partial-oxidation of methane with respect to three objectives. MOO utilized a gravitational search algorithm and particle swarm optimization to tackle the problem. With respect to technical application it appears highly important to solve the reverse engineering problem as an optimization task with multiple contradicting objectives [13]. Various ML-based optimization strategies were addressed for the purpose of reverse engineering polymerization processes [21,22,23]. A genetic algorithm-based optimizer is proposed by Mohammadi et al. [9] to generate a variety of polymerization recipes at random and to send them to the kMC simulator for error evaluation.

This study provides a MOO-based reverse engineering approach, not only ensuring that the targeted MMD is obtained by means of minimizing the mean squared error (MSE), but also providing minimal reaction time and maximal monomer conversion. Frequently, there are several recipes for obtaining similar MMDs, which are referred to as candidate recipes in the following. The solution of the MOO approach is referred to as polymerization recipe, which includes temperature, reaction time as well as initial monomer and initiator concentrations. In the proposed MOO approach, the data for the selection of the optimal recipes from the search space is based on kMC simulations, as previously reported for training ML models [11]. The simulation provides the dependence of monomer conversion and the corresponding MMD on the polymerization conditions and reaction time. The proposed reverse engineering algorithm consists of several steps. First, the kMC simulations are run for the selected recipe search space to derive the MMDs and monomer concentrations as an input for the MOO step. Then, MOO is applied for a given target MMD and the Pareto optimal space is found on the base of the search space. In the last step, a weighted sum optimization function is used to calculate the weighted score of each candidate recipe, which is used for evaluating the solutions. The best candidate has the smallest score. To accelerate the MOO procedure an additional search space clustering on the basis of MMDs is considered. The approach pursued is illustrated in Figure 1. The method is tested for the following model system: chemically initiated vinyl acetate (VAc) radical polymerization model system at 60 °C using tert-butyl peroxypivalate as initiator.

2. Reverse Engineering Modeling Approach

2.1. Model Development

This section provides a formal description of solving a reverse engineering problem with an MOO approach. State of the art MOO methods [24] like genetic algorithms require to generate and evaluate new recipes in each step. Due to the high number of required steps, it is very demanding to perform on-line kMC simulations in loop [9].

Instead, here a recipe search space R is selected, where each recipe r ∈ R consists of reaction time t, the initial monomer concentration c_m,0, and the initial initiator concentration c_ini,0. Then, the corresponding monomer concentration c_m(r) and molar mass distribution MMD(r) are obtained for each r ∈ R via kMC simulation. In the MOO approach, the input is a target MMD, MMD^target, and the output is a set of optimal candidate recipes R^*:

r_{1}^{*}

,

r_{2}^{*}

, …,

r_{n}^{*}

. R^* is a subset of the recipe search space R, R^*⊂ R with MMD(

r_{i}^{*}

) being close to MMD^target as evaluated on the basis of the MSE as well as the maximal conversion and minimal time.

The optimization variables are presented in Table 1. The lower and upper limits for the variables c_mon,0, c_ini,0, and t are defined by the simulated data. In Table 2 for a specific recipe r, the simulated values c_m(r) and MMD(r) are used to calculate the values of three optimization objective functions: objective for the reaction time, f_t(r), and objective for the mean squared error (MSE) between MMD^target and the predicted MMD(r), f_MSE. f_cm(r) is used to turn the maximization problem of monomer conversion f_conv(r) = (c_m,0 − c_m(r)) / c_m,0 into a minimization problem, in which 1 − f_conv(r) is minimized (Table 2).

The final decision takes user preferences into account by assigning specific weights to the objectives applying the weighted sum method [24,25]. Thus, the multi-objective function can be represented in a single-objective way. Then, the values of this function are calculated for each candidate recipe and the recipes with minimal values of the objective function are selected as a set of optimal solutions. A weight w_i is assigned to each normalized objective function f_i as follows:

\min_{r} f (r) = \min_{r} \sum_{i} w_{i} {∙ f}_{i} (r)

(1)

where

\sum_{i} w_{i} = 1

, i ∈ {MSE, cm, t}, r ∈ R, and R is a polymerization recipe space. For clarity of presentation, the weight w_cm of the objective f_cm is also referred to as the weight of the conversion objective. If required the number of the objectives can be increased, e.g. the conversion of initiator can be added.

The steps of the proposed algorithm are presented in Figure 2 as direct approach. First, a search space R^S ⊂ R is selected (for details see Section 2.2) and for each r ∈ R^S MMD(r) and c_m(r) are obtained via kMC simulations. Then, MOO is performed over R^S. First, the objective function values are calculated as follows: the values for the objective f_t(r) are already included in r as t, the values of the objective f_cm(r) are calculated in advance for all possible r according to Table 2, and the values of the objective f_MSE are specified by MMD^target. Further, based on the calculated objective function values, the Pareto front points R^par ⊂ R^S leading to MMD^target are identified. For this, the points from R^S are represented in the Pareto optimal space, with coordinates specified by the three objectives. The Pareto front points are found in this Pareto optimal space, such that one value of the objective function cannot be improved without downgrading the value of another objective function. Finally, for the Pareto front points R^par, the weights of each objective function are defined and a set of the best recipe candidates R^*⊂ R^par according to Eq. 1 is selected.

The above-described algorithm is improved with respect to the optimization time by means of clustering the search space, which is illustrated in Figure 2 as the clustering-supported approach. Clustering divides the search space R^S into a number of clusters as illustrated in Figure 3. First, the search space R^S is clustered on the base of MMD(r), which allows for selecting a cluster R^target ⊂ R^S containing the MMDs, which are the closest to MMD^target. In general, a larger number of clusters leads to a smaller number of MMDs per cluster gaining higher similarity of the distributions. However, there is less space for optimization regarding other objectives, e.g., such as polymerization time and monomer conversion. For this reason, an appropriate trade-off between the number of clusters and its size has to be identified. Upon appropriate clustering a cluster for the target MMD (Figure 2, red arrows) R^target is found. Then, by MOO the search space is reduced to the number of Pareto front points R^par ⊂ R^target. Finally, after defining objective weights, the best recipe candidates R^* ⊂ R^par are found according to Eq. 1. Since MOO is applied to a single cluster R^target ⊂ R^S, which is considerably smaller than R^S, the optimization time is significantly reduced.

Different methods can be applied for clustering the search space on the basis of MMD. Clustering of distributions and their representation in histograms is an important topic, which attracted a lot of attention, because of specific metrics, which should be used to compare the distributions. One of the most popular and fast clustering methods is the kMeans method [26]. A modified kMeans clustering algorithm was applied to the clustering of histograms [27]. Further, a novel non-parametric clustering algorithm of empirical probability distributions was proposed [28]. Here, the classical kMeans clustering method was used. This algorithm starts with a random separation of the MMDs into clusters. At each step, it recalculates the centroids of each cluster and relocates the data points to the new centroids. The clustering process finishes when the clusters are stable or the given number of iterations is reached. In this study, the simplest Euclidean distances are used for calculation of the distances between multi-dimensional data points, while specific metrics for clustering of distributions are also available [27,29].

There are different strategies for the data generation for the MOO procedure: use of exclusively in advance generated data from kMC simulation (Figure 2, blue arrows), on demand kMC-generated data, ML-generated data, kMC-based and ML-generated hybrid data sets, etc. Currently, as a first step, the focus is exclusively on the use of kMC-simulated data.

2.2. Data acquisition and Processing

The in-house developed kMC simulator mcPolymer was used to carry out the simulations required for the generation of polymerization data according to the search space [30]. The simulator allows for exporting the concentration profiles of all reactants and products as well as microstructural data like MMD, chain composition and branching of all polymeric species involved in the process. The simulator output was adapted to be well-machine readable. The data were filtered, further abstracted, logically connected, and stored in the well-structured no-SQL database MongoDB. The kMC simulated MMDs and monomer concentrations were obtained for the selected search space R^S, allowing for MOO and subsequent finding of the weighted optimal solution.

The kMC simulations were performed for radical polymerizations with VAc as monomer, tert. butyl peroxypivalate as initiator, and methanol as solvent. The simulations are based on a full kinetic model for the VAc radical polymerization containing all elemental reactions [31]. The following polymerization conditions were used: constant temperature of 60 °C, c_ini,0 in the range of 1.0 to 20.0 mmol·L⁻¹, and c_m,0 in the range of 2.0 to 5.0 mol·L⁻¹ with uniformly distributed grid size of c_ini,0 (geometrically scaled grid points) and c_m,0 (arithmetic scaled grid points), resulting in 225 simulations of the process. The geometric scale was selected for c_ini,0 to put more attention on the small values of this parameter. The polymerization process was simulated for a constant reaction time of 6 hours and the properties of interest were recorded every 20 minutes, thus, obtaining in total 18 data points at different time moments for each investigated property. Thus, the data set contains 4050 different MMDs.

For training the ML prediction models for single-objective reverse engineering the obtained data set was divided into training and test set in proportion of 80:20. The same data was used for MOO, again taking 80 % of the data as training set for the search space R^S and 20 % of the data as test set R^test. The test set R^test contains 810 recipes, which correspond to a set MMD^test consisting of 810 kMC-simulated MMDs. The evaluation of all optimization approaches was performed with MMD^test with each element serving as MMD^target. In order to test the MMO approach a single MMD is selected from MMD^test and used as MMD^target for the MMO approach. The performance of the MMO is evaluated by testing it with every MMD from MMD^test.

3. Results and Discussion

In order to compare results obtained via the previously described ML modeling-based reverse engineering strategy [11] with data from the MOO approach introduced in this work, the technique reported for butyl acrylate was adopted for the polymerization of vinyl acetate. Previously, it was described how an ML regression model provides a recipe r = (c_m,0, c_ini,0, T) for a fixed time for a given MMD^target. c_m,0 and c_ini,0 are initial concentrations of the monomer and the initiator, respectively, and T is the polymerization temperature. The prediction was performed with the random forest method for a target MMD to minimize the errors in the predicted recipe r(MMD). In the case of VAc polymerizations, the temperature is kept constant at T = 60 °C and the reaction time t was predicted by the model: r = (c_m,0, c_ini,0, t). The reverse engineering result for a sample MMD^target presented in Figure 4 corresponds to the recipe (c_m,0; c_ini,0; t)/ (2.0 mol/L; 1.9 mmol/L; 140 min). Only very small differences between the target and predicted MMD as well as the predicted recipe are seen. The overall performance of the developed model was evaluated using the R² determination coefficient, which is a measure for the quality of the prediction by the ML model. Keeping in mind that R²=1 indicates perfect prediction, it is remarkable to note that a rather low value of 0.78 for R² is obtained while visual inspection of the MMDs for the representative example given in Figure 4 indicates only minor differences. However, this ML-based approach does not consider multiple contradicting optimization objectives and does not provide multiple alternative solutions as proposed in the Pareto optimization.

3.1. Direct Pareto Optimization

Figure 5A shows all Pareto front points with three objectives f_i, i ∈ {MSE, cm, t}. After definition of objective weights, the optimal solution from the Pareto front points is selected, which satisfies the weighted score calculated with Eq. 1 best. Several combinations of objective weights are considered: time focused, conversion focused, MSE focused. For example, in the time focused case, the weight of time is higher than the other weights. To identify the optimal recipe leading to the target MMD, the MSE should have a higher weight than the other objectives. To reduce the number of Pareto front points an MSE limit is chosen. As an example, the data in Figure 3 was obtained with an MSE limit of 2∙10⁻³.

The optimization procedure is illustrated in Figure 5 for an example target MMD, which corresponds to the recipe (c_m,0; c_ini,0; t) / (2.0 mol/L; 1.9 mmol/L; 140 min). Rather than presenting the objective f_cm, which represents the minimized monomer concentration, the conversion is shown in Figure 5A,B. Figure 5B demonstrates the reduction of the number of Pareto front points from 340 points (A) to 46 points (B) by filtering with the MSE limit of 2∙10⁻³. The MMDs of all Pareto front points and of the filtered Pareto front points are given in Figure 5C,D, respectively. The filtering approach allows for considering only MMDs of similar shape. Figure 6 shows the impact of different MSE limits on the number of Pareto front points. Even with an MSE limit of 10⁻³ the average number of points is around 30, which is a relevant number for consideration of other objectives. The minimal number of filtered Pareto points for the MSE limit of 10⁻³ is about 10.

Figure 7 allows for comparison of MSEs obtained via ML prediction-based (left) and MOO-based approaches (right) for reverse engineering calculated for all elements of MMD^test. Minimal MSEs are preferable. For the ML-based prediction (left), the MSE values of almost all MMD^target from MMD^test are less than 10⁻³, while most MSEs obtained by the MOO-based approach with an MSE limit of 2∙10⁻³ are by two orders of magnitude lower. Most values are less than 10⁻⁵.

The next step is to evaluate the candidate recipes based on the weighted sum of the objective values according to Eq. 1. Figure 8 and Table 3 present the candidate recipes for the different combinations of objective weights indicated. In Figure 8, a color code is assigned to the individual candidate recipes according to the score of the weighted sum of the values of the three objective functions f_MSE, f_cm, f_t calculated according to Eq. 1. For the weights w_MSE = 0.9, w_cm = 0.02, and w_t = 0.08 in Figure 8A the focus is on the MSE. As a minimization problem is solved, a minimal score value (visualized by big pink points) determines the best candidates, given by the recipes with IDs 0, 15, 16 in Table 3. Other weight combinations with focus on the monomer conversion in Figure 8B and with equal weights in Figure 8C demonstrate, which recipe candidates are selected depending on specific requirements.

3.2. Clustering Supported Pareto Optimization

The clustering-supported optimization is performed for the same MMD^target as in the previous section. It is assumed that all points contained in the target cluster R^target have satisfactory MSEs. For this reason, only the two objectives f_cm and f_t have been chosen to be optimized. Figure 9 shows R^target in the coordinate space, which corresponds to the objective functions. The Pareto front points R^par ⊂ R^target are indicated by colored markers. Table 4 presents all Pareto front points with the corresponding recipes and objective weights.

To find the weighted solution for a time focused result with w_cm = 0.2 and w_t = 0.8 the candidate with ID 1 reaching a conversion of 47 % at a reaction time of 40 min is best. If conversion is considered to be more important than reaction time as in the case of w_cm = 0.8 and w_t = 0.2, the candidate with ID 4 is the best solution leading to a conversion of 77 % with a reaction time of 100 min.

MMD clustering focuses only on a single objective function f_MSE when identifying the most similar cluster for the target MMD. The accuracy is determined on the basis of the MSE, which represents the deviation of the selected solution from MMD^target and depends on the cluster size, and consequently, on the number of clusters. For a small number of clusters, every cluster contains a broader spectrum of MMDs as compared to a large number of clusters and leads to an increase in the MSEs of MMD^target and the MMDs contained in this cluster. However, the cluster covers a larger value range of the objective function, allowing for a broader range of predicted recipes. Figure 10 depicts how the MMD^t^arget can be reached for clustering the search space into 20, 40, and 60 clusters, respectively. Figure 10A–C shows the points of the target clusters in the objective function space. The shape of the clusters is similar. As the number of clusters grows, the number of candidates per cluster decreases, because there are fewer points fitting the required accuracy with respect to the MMD. Therefore, the size of the whole cluster shrinks. The red points in Figure 10 mark the Pareto front points, from which the optimal candidates are selected.

Figure 10D–F shows the MMDs for all Pareto front points. With increasing number of clusters, the MSE of MMD^target and Pareto front MMDs decreases. The objective function value range also decreases as illustrated in Table 5. For the example, given f_MSE of MMD^target improves by a factor of 100 when going from a total number of 20 clusters to a total number of 60 clusters. This improvement is achieved by a loss in conversion, which is reduced from 58 % to 27 %. In this example, reaction time is also lowered with increasing number of clusters. The last two rows in Table 5 provide the scores calculated by Eq. 1 for the best recipes obtained with clustering of the search space into 20, 40, and 60 clusters for two cases with a different number of objectives. Note, that only the scores obtained with the same combination of weights (given in a single row of Table 10) are comparable. For equal weights of two objectives (w_cm = 0.5, w_t = 0.5) the minimal score of 0.39 was obtained with clustering into 40 clusters. Although the optimization has been carried out within one cluster using only two objectives, the results can be related to the direct Pareto optimization, including the MSE objective. It is shown that for equal weight of three objectives (w_MSE = 1/3, w_cm = 1/3, w_t = 1/3) the minimal score of 0.30 was also obtained with the same clustering into 40 clusters.

The comparison of the direct and the clustering-supported Pareto approach is based on the MSE limit. For the direct approach, the MSE limit can be chosen by the operator. For the clustering-supported approach, the individual MSE limit for a given MMD^target, which is evidently depending on the number of clusters, is defined as the maximal MSE value for all MMDs contained in R^target.

Both approaches were compared on the basis of MMD^test. Then, for the clustering supported approach, the MSE limit is defined as the maximal value of all individual MSE limits over MMD^test. Figure 11 shows the dependency of the MSE limit on the total number of clusters. For more than 30 clusters, the MSE limit is almost constant at a level of 0.003. The MSE limit increases rapidly for less than 30 clusters.

Figure 12A illustrates the average number of Pareto front points depending on the MSE limit. Increasing the MSE limit results in a reduction in the number of clusters (see Figure 11) associated with an enlargement in cluster size, and thereby leading to a higher count of Pareto front points. The clustering-supported approach is more selective considering the average number of Pareto front points. The standard deviation is used to describe the value ranges of the objective function and of the MSE function. Wider ranges are advantageous, because they provide more space for optimization. Averaging the value ranges of each objective function over MMD^test yields the average ranges illustrated in Figure 12B–D. The value ranges of the reaction time and conversion are wider (Figure 12C,D) for the clustering supported approach, while the MSE value range is more restricted (Figure 12B). The findings indicate that the clustering approach allows for a better optimization of monomer conversion and reaction time while the direct approach is better suited for optimization of the MSEs.

Table 6 presents the execution time for both approaches for the test set MMD^test consisting of 810 target MMDs. For the clustering-supported approach, the optimization time depends slightly on the number of clusters. However, it is by a factor of ten smaller than for the direct approach.

The rapidness of the MOO algorithm is especially important, for the next step, when in future genetic algorithms together with ML models will be applied to extend the search space for the polymerization parameters, because Pareto optimization should be executed many times for each evolution stage of genetic algorithms. In this situation, the clustering supported Pareto optimization is preferable.

4. Conclusions and Outlook

The study bridges the gap between multi-objective optimization (MOO) and its application for reverse engineering of polymerization processes. The proposed MOO models allow for simulation-supported determination of an optimal recipe for a targeted molar mass distribution, taking multiple objectives, e.g., minimal reaction time, maximal conversion of monomer, and similarity of the found MMD compared to the target MMD into consideration. The proposed approach can be accelerated by an additional clustering of the simulated MMDs. Moreover, it is possible to obtain a number of suitable candidates considering different weights of the selected objectives. A set of alternative optimal solutions is obtained for each specific combination of weights. First insights are provided on the formulation of a polymerization reverse engineering task as a MOO problem. In future, the understanding gained in combination with previously proposed ML models [11] for polymerizations will be applied to facilitate and to speed-up finding of optimal solutions with a limited number of kMC simulations. ML models can be used to set-up the search space and to evaluate the candidate polymerization procedures rather than using kMC simulations. The search for candidate solutions will be performed with the help of genetic algorithms. Moreover, the consideration of more complex microstructural details, e.g., such as branching in high-temperature polymerization of acrylate [32], may require to increase the number of objectives in MOO.

Data Availability Statement

The data presented in this study are openly available in Mendeley Data at doi:10.17632/hdrwmxgx27.1 with reference: https://data.mendeley.com/preview/hdrwmxgx27?a=d024c5aa-a100-473f-b641-bc0d3f11ebb5.

Acknowledgments

This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – 466601458 – within the Priority Programme “SPP 2331: Machine Learning in Chemical Engineering”.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ballard, N.; Asua, J.M. Radical polymerization of acrylic monomers: An overview. Prog. Polym. Sci. 2018, 79, 40–60. [Google Scholar] [CrossRef]
Meszena, Z.G.; Johnson, A.F. Modelling and simulation of polymerisation processes. Comput. Chem. Eng. 1999, 23, 375–378. [Google Scholar] [CrossRef]
Saldívar-Guerra, E. Numerical Techniques for the Solution of the Molecular Weight Distribution in Polymerization Mechanisms, State of the Art. Macromol. React. Eng. 2020, 14, 2000010. [Google Scholar] [CrossRef]
Gómez-Reguera, J.A.; Vivaldo-Lima, E.; Gabriel, V.A.; Dubé, M.A. Modeling of the Free Radical Copolymerization Kinetics of n-Butyl Acrylate, Methyl Methacrylate and 2-Ethylhexyl Acrylate Using PREDICI®. Processes 2019, 7, 395. [Google Scholar] [CrossRef]
Trigilio, A.D.; Marien, Y.W.; Van Steenberge, P.H.M.; D’hooge, D.R. Gillespie-Driven kinetic Monte Carlo Algorithms to Model Events for Bulk or Solution (Bio)Chemical Systems Containing Elemental and Distributed Species. Ind. Eng. Chem. Res. 2020, 59, 18357–18386. [Google Scholar] [CrossRef]
Peikert, P.; Pflug, K.M.; Busch, M. Modeling of High-Pressure Ethene Homo- and Copolymerization. Chem. Ing. Tech. 2019, 91, 673–677. [Google Scholar] [CrossRef]
Trigilio, A.D.; Marien, Y.W.; Van Steenberge, P.H.M.; D’hooge, D.R. Toward an Automated Convergence Tool for Kinetic Monte Carlo Simulation of Conversion, Distributions, and Their Averages in Non-dispersed Phase Linear Chain-Growth Polymerization. Ind. Eng. Chem. Res. 2023, 62, 2583–2593. [Google Scholar] [CrossRef]
Brandão, A.L.T.; Soares, J.B.P.; Pinto, J.C.; Alberton, A.L. When Polymer Reaction Engineers Play Dice: Applications of Monte Carlo Models in PRE. Macromol. React. Eng. 2015, 9, 141–185. [Google Scholar] [CrossRef]
Mohammadi, Y.; Saeb, M.R.; Penlidis, A.; Jabbari, E.; J Stadler, F.; Zinck, P.; Matyjaszewski, K. Intelligent Machine Learning: Tailor-Making Macromolecules. Polymers 2019, 11, 579. [Google Scholar] [CrossRef]
Mohammadi, Y.; Penlidis, A. Polymerization Data Mining: A Perspective. Adv. Theory Simul. 2019, 2, 1800144. [Google Scholar] [CrossRef]
Fiosina, J.; Sievers, P.; Drache, M.; Beuermann, S. Polymer reaction engineering meets explainable machine learning. Comput. Chem. Eng. 2023, 177, 108356. [Google Scholar] [CrossRef]
Spyromitros-Xioufis, E.; Tsoumakas, G.; Groves, W.; Vlahavas, I. Multi-target regression via input space expansion: treating targets as inputs. Mach. Learn. 2016, 104, 55–98. [Google Scholar] [CrossRef]
Ngatchou, P.; Zarei, A.; El-Sharkawi, A. Pareto Multi Objective Optimization. In Proceedings of the 13th International Conference on Intelligent Systems Application to Power Systems, 2005. 13th International Conference on, Intelligent Systems Application to Power Systems, Arlington, Virginia, USA, 06–10 Nov. 2005; IEEE / Institute of Electrical and Electronics Engineers Incorporated. 2005; pp. 84–91, ISBN 1-59975-174-7. [Google Scholar]
Gunantara, N. A review of multi-objective optimization: Methods and its applications. Cogent Eng. 2018, 5, 1502242. [Google Scholar] [CrossRef]
Doumpos, M.; Zopounidis, C. Multi-objective optimization models in finance and investments. J. Glob. Optim. 2020, 76, 243–244. [Google Scholar] [CrossRef]
Gupta, S.; Haq, A.; Ali, I.; Sarkar, B. Significance of multi-objective optimization in logistics problem for multi-product supply chain network under the intuitionistic fuzzy environment. Complex Intell. Syst. 2021, 7, 2119–2139. [Google Scholar] [CrossRef]
Bhaskar, V.; Gupta, S.K.; Ray, A.K. APPLICATIONS OF MULTIOBJECTIVE OPTIMIZATION IN CHEMICAL ENGINEERING. Rev. Chem. Eng. 2000, 16, 1–54. [Google Scholar] [CrossRef]
Murugan, C.; Subbaian, S. Multi-Objective Optimization for Enhanced Ethanol Production during Whey Fermentation. In 2022 International Conference on Power, Energy, Control and Transmission Systems (ICPECTS), Chennai, India, 08–09 Dec. 2022; IEEE, 2022; pp 1–7, ISBN 978-1-6654-6275-4.
Fiandaca, G.; Fraga, E.S.; Brandani, S. A multi-objective genetic algorithm for the design of pressure swing adsorption. Eng. Optim. 2009, 41, 833–854. [Google Scholar] [CrossRef]
Ganesan, T.; Elamvazuthi, I.; Ku Shaari, K.Z.; Vasant, P. Swarm intelligence and gravitational search algorithm for multi-objective optimization of synthesis gas production. Appl. Energy 2013, 103, 368–374. [Google Scholar] [CrossRef]
Charoenpanich, T.; Anantawaraskul, S.; Soares, J.B.P. Using Artificial Intelligence Techniques to Design Ethylene/1-Olefin Copolymers. Macromol. Theory Simul. 2020, 29, 2000048. [Google Scholar] [CrossRef]
Dragoi, E.N.; Curteanu, S. The use of differential evolution algorithm for solving chemical engineering problems. Rev. Chem. Eng. 2016, 32, 149–180. [Google Scholar] [CrossRef]
Fernandes, F.A.N.; Lona, L.M. Neural network applications in polymerization processes. Braz. J. Chem. Eng. 2005, 22, 401–418. [Google Scholar] [CrossRef]
Konak, A.; Coit, D.W.; Smith, A.E. Multi-objective optimization using genetic algorithms: A tutorial. Reliab. Eng. Syst. Saf. 2006, 91, 992–1007. [Google Scholar] [CrossRef]
Marler, R.T.; Arora, J.S. The weighted sum method for multi-objective optimization: new insights. Struct. Multidisc. Optim. 2010, 41, 853–862. [Google Scholar] [CrossRef]
Hartigan, J.A.; Wong, M.A. Algorithm AS 136: A K-Means Clustering Algorithm. Appl. Stat. 1979, 28, 100–108. [Google Scholar] [CrossRef]
Nielsen, F.; Nock, R.; Amari, S. On Clustering Histograms with k-Means by Using Mixed α-Divergences. Entropy 2014, 16, 3273–3301. [Google Scholar] [CrossRef]
Henderson, K.; Gallagher, B.; Eliassi-Rad, T. EP-MEANS: An Efficient Nonparametric Clustering of Empirical Probability Distributions. In Proceedings of the 30^th Annual ACM Symposium on Applied Computing. SAC 2015: Symposium on Applied Computing, Salamanca Spain, 13–17 Apr. 2015; Wainwright, R.L., Corchado, J.M., Bechini, A., Hong, J., Eds.; ACM: New York, NY, USA, 2015; pp 893–900, ISBN 9781450331968.
Banerjee, A.; Merugu, S.; Dhillon, I.S.; Ghosh, J. Clustering with Bregman Divergences. J. Mach. Learn. Res. 2005, 6, 1705–1749. [Google Scholar]
Drache, M.; Drache, G. Simulating Controlled Radical Polymerizations with mcPolymer—A Monte Carlo Approach. Polymers 2012, 4, 1416–1442. [Google Scholar] [CrossRef]
Feuerpfeil, A.; Drache, M.; Jantke, L.-A.; Melchin, T.; Rodríguez-Fernández, J.; Beuermann, S. Modeling Semi-Batch Vinyl Acetate Polymerization Processes. Ind. Eng. Chem. Res. 2021, 60, 18256–18267. [Google Scholar] [CrossRef]
Mätzig, J.; Drache, M.; Drache, G.; Beuermann, S. Kinetic Monte Carlo Simulations as a Tool for Unraveling the Impact of Solvent and Temperature on Polymer Topology for Self-Initiated Butyl Acrylate Radical Polymerizations at High Temperatures. Macromol. Theory Simul. 2023, 32, 2300007. [Google Scholar] [CrossRef]

Figure 1. Illustration of the polymerization reverse engineering approach by means of MOO.

Figure 2. Direct Pareto optimization vs. clustering supported Pareto optimization.

Figure 3. Illustration of the clustering approach of the search space. In the middle two exemplary clusters of MMDs are presented, on the right all clusters defined are given.

Figure 4. An example of reverse engineering with ML modeling comparing the MMD (left) and the recipe (right) for a single MMD^target, with MMD^target being an element of MMD^test. The target recipe is the recipe associated with MMD^target.

Figure 5. Pareto front points (A), filtered Pareto front points (B), MMDs for all Pareto front points (C), and MMDs for filtered Pareto front points (D). The target MMD is given in bold black line, the MSE limit was set to 2∙10⁻³.

Figure 6. The influence of the MSE limit on the distribution of the number of filtered Pareto front points based on MMD^test.

Figure 7. MSEs of the MMDs of the recipes obtained by ML prediction based (left) and MOO (right) approaches on the basis of MMD^test.

Figure 8. Influence of the combination of objective weights on the selection of the optimal recipes for a sample MMD^target: time focused (w_MSE = 0.05, w_cm = 0.05, w_t = 0.9) (A), monomer conversion focused (w_MSE = 0.01, w_cm = 0.8, w_t = 0.19) (B), equal weights (w_MSE = 1/3, w_cm = 1/3, w_t = 1/3) (C), MSE focused (w_MSE = 0.9, w_cm = 0.09, w_t = 0.01) (D). The best three solutions are shown with their IDs for each case.

Figure 9. Clustering supported Pareto optimization. A: Illustration of the target cluster R^target. The Pareto front points R^par ⊂ R^target are marked (colored markers) and are listed in Table 4. A time focused result (w_cm = 0.2, w_t = 0.8), an equal weighted result (w_cm = 0.5, w_t = 0.5) and a conversion focused result (w_cm = 0.8, w_t = 0.2) are highlighted (1, 2, 3). The corresponding MMDs in comparison with MMD^target are shown in B.

Figure 10. Cluster size dependence. A-C: Clusters for MMD^target with a total number on 20 (A), 40 (B) or 60 (C) clusters. D-F: MMDs for the Pareto front points.

Figure 11. MSE limit calculated for the clustering of the search space.

Figure 12. Comparison of the direct and the clustering-supported approach considering (A) the average number of Pareto front points, (B) the average MSE range, (C) the average polymerization time range, and (D) the average conversion range.

Table 1. Optimization variables.

variables	description	restrictions
c_m,0	initial monomer concentration	c_m,0(min) ≤ c_m,0 ≤ c_m,0(max)
c_ini,0	initial initiator concentration	c_ini,0(min) ≤ c_ini,0 ≤ c_ini,0(max)
t	reaction time	t_min ≤ t ≤ t_max
r=[c_m,0, c_ini,0, t]	initial recipe

Table 2. Optimization objective functions and their calculation on the basis of simulated data.

objectives	description
$\underset{r}{m i n}$ f_MSE(r)= $\min_{r} [M S E ({M M D}^{t a r g e t}, M M D (r))]$	minimal MSE, where MMD(r) is simulated
$\underset{r}{m i n}$ f_cm(r)= $\min_{r} [1 - f_{c o n v} (r)] = \min_{r} \frac{c_{m} (r)}{c_{m, 0}}$	minimal relative monomer concentration
$\underset{r}{m i n}$ f_t(r) = $\underset{r}{m i n}$ t	minimal reaction time (directly from r)

Table 3. The best recipes for each combination of objective weights (the focal weight is given in bold).

IDs of the best recipe		1	34	35		46	20	33
	w_i	time focus (A)			w_i	conversion focus (B)
c_m,0 / mol∙L⁻¹		2.0	2.21	2.21		2.21	2.43	2.21
c_ini,0 / mmol∙L⁻¹		20	20	20		8.4	16.1	10.5
time / min	0.9	20	40	60	0.19	120	80	100
MSE /∙10⁻³	0.05	1.63	1.37	0.35	0.01	1.72	1.58	1.16
conversion / %	0.05	25.9	46.6	61.8	0.8	71.3	69.6	68.7

IDs of the best recipe		35	32	21		43	12	0
	w_i	equal weights (C)			w_i	MSE focus (D)
c_m,0 / mol∙L⁻¹		2.21	2.21	2.21		2.0	2.0	2.0
c_ini,0 / mmol∙L⁻¹		20.0	10.5	13.0		12.4	10.0	8.5
time / min	1/3	60	80	80	0.01	180	200	60
MSE / ∙10⁻⁵	1/3	35	31	54	0.9	0.086	0.032	0.64
conversion / %	1/3	61.8	60.2	64.4	0.09	48.3	48.2	45.1

Table 4. Pareto front points R^par and their recipes and objective functions. Every highlighted point (bold letters) is the best result for the corresponding example objective weights.

objective weights	cand. ID	c_m,0 / mol∙L⁻¹	c_ini,0 / mmol∙L⁻¹	time / min	MSE / 10⁻³	conversion / %
	0	2.00	20.0	20	1.63	25.9
time focus: (w_cm = 0.2, w_t = 0.8)	1	2.21	20.0	40	1.37	46.6
	2	2.43	20.0	60	1.94	62.7
equal weights: (w_cm = 0.5, w_t = 0.5)	3	2.43	20.0	80	1.81	73.7
conversion focus: (w_cm = 0.8, w_t = 0.2)	4	2.43	16.1	100	2.61	77.7
	5	2.43	6.86	160	2.79	78.3
	6	2.43	5.54	180	2.83	78.5
	7	2.43	2.91	260	2.98	79.0

Table 5. Pareto optimal results with balanced weights for different number of clusters.

Number of clusters		20	40	60
cand. ID/		3	2	1
property	w_i
c_m,0 / mol∙L⁻¹		3.50	3.07	2.86
c_ini,0 / mmol∙L⁻¹		13.0	20.0	20.0
time / min	0.5	60.0	40.0	20.0
MSE /∙10^-3		3.04	0.32	0.03
conversion / %	0.5	58.2	49.2	27.0
score (w_t = 0.5, w_cm = 0.5)		0.50	0.39	0.50
score (w_MSE = 1/3, w_cm = 1/3, w_t = 1/3)		0.67	0.30	0.33

Table 6. Comparison of optimization time of direct and clustering supported approaches.

approach	direct	clustering supported approach
		20 clusters	40 clusters	60 clusters	80 clusters	100 clusters
execution time, s	679	56	25	18	14	12

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.