1. Introduction
Dielectric permittivity is a fundamental electrical property that characterizes the response of a material when subjected to an electric field [
1] This measurement is intrinsically related to the dielectric constant (ε) and reflects the material’s ability of the alignment and orientation of electrical dipoles within its structure in response to an externally applied electric field. It can be observed that the greater the polarizability of molecules, the greater the (ε) value [
2]. This property is a fundamental molecular characteristic and commonly used to predict other electrical properties of polymers [
3,
4,
5]. It applies to materials physics, chemistry, electrical engineering, and polymer science [
1]. It is implementations are evident in high energy density capacitors [
6], high voltage cables [
6], microelectronics [
7] and photovoltaic devices [
8,
9].
However, calculating the dielectric constant in polymers theoretically presents a multifaceted challenge. This inherently nonlinear property requires considering factors such as temperature, frequency, polymer structure, composition, sample morphology, impurities, loads, plasticizers, and other additives [
4,
10]. Furthermore, each application demands a specific range for the polymer's dielectric constant (ε), tailored to the unique requirements of the problem at hand [
5]. This comprehension is crucial for designing new materials. Therefore, given the inherent complexity of many substances, there is a significant demand for machine learning (ML) models to efficiently predict these properties, optimizing both time and resources.
In the field of materials science and cheminformatics, the Quantitative Structure-Property Relationship (QSPR) methodology stands out as an important machine learning-based approach. This methodology relies on machine learning models to forecast or elucidate compound properties by leveraging distinct chemical descriptors [
11]. The efficacy of the model's predictions and its capacity to unveil the relationships between a material's molecular or other microscopic physical properties and the targeted properties being modeled are significantly influenced by the careful selection of descriptors [
11]. In this sense the QSPR approach has proven to be effective in predicting various properties, including glass transition (Tg) in polymers [
12,
13] and (Tg) in polymer coating materials [
14]. Several QSPR models have also been developed for predicting dielectric permittivity in polymers [
1,
15,
16,
17].
using different datasets, feature-representation methods, variable selection procedures and so on for instance, Liu et al. [
17] developed a QSPR model to predict dielectric permittivity using a small dataset of 22 polyalkenes. The resulting model, built utilizing multiple linear regression analysis (MLRA), had a high (R
2train) value of 0.907 and standard error (s) of 0.001 on the training set. Three quantum descriptors were selected: ELUM (energy of the lowest unoccupied molecular orbital), q- (minimum negative atomic charge) and S (configurational entropy of the system). The authors thoroughly explored the physical significance of these descriptors, linking them to polymer polarizability and charge separation capability.
In subsequent studies in 2016, Wu et al. [
16]developed a model to predict the dielectric constant of 58 polymers. They employed Partial Least Squares (PLS) regression as the modeling technique, incorporating the Infinite Chain Descriptors (ICD) 2D, TAE and GAP_inf3_inv. The model trained on the training dataset showed (R
2train) of 0.91 and a Root Mean Square Error (RMSE) of 0.11. Additionally, when evaluating the model on an external test set, it achieved strong predictive capabilities, reaching an (R
2test) of 0.96 and an RMSE of 0.11 in both cases. Finally, in a more recent study, Yevhenii et al. [
1] used a data set of 71 polymer samples. They applied genetic algorithms (GA) and multiple linear regression analysis (MLRA) to select optimal descriptors and develop models (QSAR). Two models were created: The first model used five descriptors, achieving an (R
2 train) of 0.842 and a standard error (s) of 0.187. The second model incorporated eight descriptors, demonstrating improved results with (R
2train) of 0.905 and s of 0.151, both models exhibited robust predictive skills when externally validated, (R
2test) of 0.829 and 0.81 respectively.
Although all of these earlier publications report on QSAR/QSPR studies to predict dielectric permittivity of different polymers, they have certain limitations. First of all, not all models use a separate set of tests to validate model predictions and the size of published data sets is small or limited, restricting the applicability domain of the model.
In this work, a QSAR model was developed using a Gradient Boosting (GB) method, a sequential method that improves predictive accuracy through iterative combination and adjustment of weak models. The method is powerful since it is updating the weights after each iteration, influencing precise models in the sequence for continuous improvement of overall accuracy over time [
18,
19]. Thus, GB has been successfully used in QSAR models to predict bandgap [
20] and glass transition temperature [
21] in polymers, with predictive capacity of R
2 train above 0.90 in both cases, where high prediction quality was achieved even with many descriptors without overfitting [
22].
The model in this study was built with a data set of 86 polymers, where two models (GB_A and GB_B) were evaluated by cross-validation and external data sets. The optimization of the models involved the use of eight descriptors, and six descriptors, respectively. Several parameters were adjusted for this model using a grid search technique. The optimized model demonstrated an effective prediction of the dielectric constant in various types of polymers. Also, in this study the Cumulative Local Effect (ALE) approach was used to facilitate the visualization of the individual impact of each descriptor on dielectric permittivity predictions. ALE graphs serve as effective tools for both visualizing and quantifying the individual influence of each input on prediction [
23].
To our best knowledge, to date only one study has utilized the ALE method to elucidate the mechanistic relationship of a nonlinear QSAR models related to toxicity (log LD50) discussed in work [
23]. However, no previous studies have been identified that apply this approach to investigate dielectric permittivity.
4. Conclusions
A model was developed to predict the dielectric constants (ε) for various polymers providing a detailed explanation from a mechanistic perspective. The study introduced QSPR models developed by applying Gradient Boosting algorithm. The GB_A model, having 8 descriptors, showed better performance with (R2train) = 0.938 and (R2test) = 0.802, while the GB_B model, which has 6 descriptors, showed (R2train) = 0.822 and (R2test) = 0.704. The validity of the models was additionally ensured by various statistical verification methods, such as MAE and RMSE. The contribution of each descriptor to dielectric permittivity was discussed by applying the Acumulative Local Effect (ALE) approach. This approach worked well in analyzing the individual influence of each descriptor on dielectric permittivity predictions. The QSPR-GBR models have 5 descriptors in total that showed strong positive effects on dielectric permittivity, while one common descriptor (MLOGP2) showed a negative effect. It is important to note that TDB09m was also involved in these two models, having a positive effect. In conclusion, this study demonstrated an appropriate approach to guide the prediction of dielectric constants in a wide range of polymers, using non-linear models. The ability to predict the dielectric constant through models, with relationship-related interpretations in ALE plots, not only optimizes the design of polymers with specific electrical properties, but also accelerates the development of polymeric materials for practical applications, reducing the need for costly and lengthy experiments.
Author Contributions
Conceptualization, B.R.; Methodology, E.A., S.H., A.D., K.I and G.C.; Validation, E.A.; Formal Analysis, E.A., G.C. and B.R.; Investigation, E.A.; Resources, G.C.; Data Curation, E.A. and G.C.; Writing – Original Draft Preparation, E.A.; Writing – Review & Editing, G.C. and B.R.; Visualization, E.A. and G.C.; Supervision, S.A., H.G.D. and B.R.; Project Administration, B.R.; Funding Acquisition, S.A., H.G.D. and B.R. All authors have read and agreed to the published version of the manuscript.