Exploring the Symmetry of Curvilinear Regression Models for Enhancing the Analysis of Fibrates Drug Activity through Molecular Descriptors

Suha Wazzan; Nurten Urlu Ozalan

doi:10.20944/preprints202305.0096.v1

Submitted:

02 May 2023

Posted:

03 May 2023

You are already at the latest version

Abstract

The paper describes the use of topological indices in conjunction with high cholesterol drugs, specifically Fibrates, to predict their physicochemical properties and biological activities. Fibrates are known to lower high triglycerides, increase HDL cholesterol, and reduce the small dense fraction of LDL cholesterol. The study uses a quantitative structural-property relationships (QSPR) approach, which involves analyzing the relationships between physicochemical properties and topological indices using curvilinear regression. The QSPR model predicts the physicochemical properties of the drugs based on degrees and distances determined from topological indices. The study also conducted (DFT) calculations at the B3LYP/6-31G(d,p) level on the four investigated derivatives to gain insights into their optimized geometries, DOS plots, HOMO and LUMO orbital energies, and distribution. The theoretical results presented in the study suggest that the use of topological indices in QSPR models could provide a powerful tool for predicting the physicochemical properties and biological activities of molecules, including drugs. These findings could lead to the development of new cholesterol-lowering drugs with desirable properties.

Keywords:

Topological indices

;

Fibrates

;

Curvilinear regression

;

QSPR analysis

Subject:

Computer Science and Mathematics - Mathematics

1. Introduction

Pharmacology has rapidly evolved, resulting in the introduction of numerous groundbreaking drugs each year. However, ensuring accurate testing performance requires the availability of appropriate equipment, a good rapport, and sufficient resources. Previous studies have shown that a drug’s chemical properties are intricately linked to its molecular structure. Pharmacological and medical researchers often utilize topological indices to examine the molecules’ properties and understand their impact on experimental outcomes. Hence, the topological index computation method is a useful tool for developing countries, allowing them to gather medical and biological data on upcoming drugs without the need for laboratory tests see for example [1,2,3,4].

Fibrates are a type of medication that have been shown to lower high levels of bad cholesterol (also known as low-density lipoprotein or LDL), increase good cholesterol (also known as high-density lipoprotein or HDL), and decrease the amount of small dense LDL particles in the blood. They have been found to be effective in reducing the mortality and morbidity associated with cardiovascular disease (CVD) in individuals who are at risk for developing it. However, conducting laboratory studies to investigate the physicochemical properties of fibrates can be both expensive and time-consuming. To overcome this challenge, chemists can use topological indices to derive mathematical equations that provide valuable insights into the properties of fibrates. For more information on fibrates, please refer to sources [5] and [6].

Chemical graph theory is a field that integrates mathematical modeling of chemical phenomena with graph theory. It utilizes topological indices to establish a correlation between the properties of a chemical molecule and its structure [7]. These indices are also known as graph invariants or graph-based molecular descriptors, and they quantify the topological features of a molecule or molecules [8]. The application of quantitative structure-property/structure-activity relationships (QSPR/QSAR) models, which are commonly employed in this field, allows for the prediction of molecular properties using these topological indices. In 1947, Harold Wiener introduced the Wiener index, the first topological index, Paraffin’s physical properties were determined using it [9].

Topological indices, which are numerical values derived from the molecular graph of a chemical compound, have been extensively studied in the fields of quantitative structure-property relationship (QSPR) and quantitative structure-activity relationship (QSAR) analyses. These indices encode the structural and topological information of molecules and have proven useful in predicting various physical, chemical, and biological properties [10,11,12,13,14]. The use of molecular graphs to represent unsaturated hydrocarbon structures provides a more intuitive and comprehensive understanding of the molecular characteristics and behavior of compounds [15,16,17,18,19,20]. In drug design, knowledge of molecular structure is essential in determining their potential therapeutic activity and overall effectiveness. In this study, we examine several vertex-degree based topological indices, including the first and second Zagreb indices, hyper-Zagreb index, sigma index, Inverse symmetric deviation index, Max-min rodeg index, Min-max rodeg index, Inverse sum deviation index, Atom-bond connectivity index, Randic index, and Albertson index [21,22,23,24,25,26,27,28,29,30,31]. Additionally, we investigate topological indices based on distance, such as Wiener index, Schultz index, Harary index, and Gutman index [32,33,34]. These indices are used to classify the molecular descriptors and analyze the efficacy of curvilinear regression models in predicting the activity of fibrates drugs.

Molecular descriptors have been widely used to evaluate the physicochemical and bioactive properties of chemical structures, and their inclusion in curvilinear regression models can enhance the analysis of drug activity. Topological indices, such as the Zagreb indices, have shown promise in predicting the effectiveness of cancer treatments [35]. The max-min rodeg index has been found to give reliable predictions for octane isomers and polychlorobiphenyls in linear regression models [36]. A new index called the Atom-bond connectivity index has been proposed to determine the complexity of alkanes [37]. The first hyper-Zagreb index has been found to be the preferred method for estimating the boiling points of benzenoid hydrocarbons [38]. Additionally, the indeg indices have been applied to predict topological polar areas [39]. The inverse sum deviation index has been used to calculate the vaporization and sublimation enthalpies of monocarboxylic acids [40] and [41]. Irregularity indices based on different degrees, in addition to Albertson and Sigma indices, have been found to predict the physicochemical properties of octane isomers [42]. The Wiener index was first introduced in quantitative structure-property relationship (QSPR) studies, and has been shown to align well with the boiling points of alkanes [43]. The Wiener index has been further developed and used to explain different chemical and physical properties of molecules, as well as their biological activity [44]. The Schultz index has also been investigated to predict the boiling points of alkyl alcohols, and thus their suitability for various applications [45]. As indicated in Table 1, these indices are expressed mathematically and are shown with mathematical expressions.

Fenofibrate is an important component of a healthy diet and medication regimen, as it is used to reduce blood cholesterol and triglyceride levels. By decreasing triglyceride levels in the bloodstream, the risk of pancreatitis (inflammation of the pancreas) can be mitigated. To date, only one paper [46] has explored the use of topological indices in analyzing one of the drugs in the Fibrates family. This study utilized

v e -

degree,

e v -

degree, and degree-based (

D -

based) approaches to compute the topological indices of fenofibrate’s chemical structure. With limited existing literature on Fibrates that incorporate topological indices, this paper represents a pioneering effort in the investigation of novel physicochemical properties of Fibrates using this technique. In this work, Fenofibrate

(C_{20} H_{21} C l O_{4})

, Ciprofibrate

(C_{13} H_{14} C l_{2} O_{3})

, Bezafibrate

(C_{19} H_{20} C l N O_{4})

, Clofibrate

(C_{12} H_{15} C l O_{3})

drugs used in the treatment patients with high cholesterol are studied.

Fibrates drugs are a class of medications commonly used to treat dyslipidemia, a condition characterized by abnormal lipid levels in the blood. Despite their widespread use, the molecular mechanisms underlying the activity of fibrates drugs are not well understood. One approach to addressing this challenge is to develop quantitative structure-activity relationship (QSAR) models that can predict the activity of fibrates drugs based on their molecular descriptors. In this study, we investigate the efficacy of curvilinear regression models in enhancing the analysis of fibrates drug activity through molecular descriptors. Curvilinear regression models are a type of non-linear regression model that can capture non-linear relationships between variables, making them useful for analyzing complex systems such as the interactions between drugs and their molecular targets. Our study builds upon previous research that has investigated the use of QSAR models to predict the activity of drugs. Several articles published in Symmetry have explored the use of topological indices and other mathematical methods to predict various properties of organic compounds, including their biological activity. For example, a study by Liu et al. [47] investigated the efficacy of using topological indices in QSPR models for predicting the densities and viscosities of biodiesel. The authors used a dataset of 105 biodiesel compounds with known properties and developed models using multiple linear regression and artificial neural network methods. They compared the performance of their models with previous studies and found that the models developed using topological indices had higher accuracy in predicting the properties of biodiesel. Zuo and Hu [48] developed QSPR models for predicting the melting points of organic compounds using molecular topology and quantum chemical descriptors. The authors used a dataset of 893 organic compounds and developed multiple linear regression models using the partial least squares (PLS) method. They compared their models with other models reported in the literature and found that their models were more accurate in predicting the melting points of organic compounds. Zhang et al. [49] developed QSPR models for predicting the melting points of organic compounds based on molecular topology. The authors used a dataset of 1,427 organic compounds and developed models using the neural network algorithm. They compared their models with other models reported in the literature and found that their models were more accurate in predicting the melting points of organic compounds. Naghipour and Kiasat [50] developed a QSPR model for predicting the fullerene-like behavior of C60 derivatives using topological indices. The authors used a dataset of 46 C60 derivatives with known fullerene-like behavior and developed a model using multiple linear regression. They compared their model with other models reported in the literature and found that their model had higher accuracy in predicting the fullerene-like behavior of C60 derivatives. Wang and Xu [51] developed QSPR models for predicting the boiling points of alkyl alkanes based on the novel vertex degree valence topological index. The authors used a dataset of 388 alkyl alkanes and developed models using multiple linear regression and artificial neural network methods. They compared their models with other models reported in the literature and found that their models were more accurate in predicting the boiling points of alkyl alkanes.

In our study, we apply curvilinear regression models to analyze the activity of fibrates drugs based on their molecular descriptors. By incorporating non-linear relationships between variables, we aim to enhance the accuracy and predictive power of QSAR models for analyzing the activity of fibrates drugs. Ultimately, our research may contribute to a better understanding of the molecular mechanisms underlying the activity of fibrates drugs, and to the development of more effective treatments for dyslipidemia. These studies demonstrate the usefulness of QSAR modeling and related techniques for predicting the activity of various compounds based on their molecular descriptors. By building on this previous work, we hope to further advance our understanding of the molecular mechanisms underlying the activity of fibrates drugs.

The QSPR model is a highly effective tool for predicting a wide range of physicochemical properties of drugs. To make these predictions, the model employs degree-based indices and distance-based topological indices (as detailed in Table 1). The properties considered include Polarizability, Sum of electronic and zero-point Energies, Sum of electronic and thermal Energies, Sum of electronic and thermal Enthalpies, Sum of electronic and thermal Free Energies, Zero-point vibrational energy, Complexity, Topological polar area, Dipole moment, Heat capacity, Molar entropy, and Octanol-water partition coefficients. To analyze the relationships between these properties and the topological indices, curvilinear regression (linear, quadratic, and cubic) is utilized. The model generates statistical parameters using SPSS and MATLAB statistical functions. In addition, DFT calculations are conducted at the B3LYP/6-31G(d,p) to gain insight into the optimized geometries, DOS plots, HOMO and LUMO orbitals energies, and distribution of the four derivatives studied in the next section. Section 3 examines the contributions of different topological indices as molecular structural descriptors. Finally, Section 4 concludes the paper.

2. DFT Part

In Figure 1,Figure 2,Figure 3, and Figure 4 four important characteristics of the four investigated Fibrate derivatives were indicated, included:

(1)

optimized geometries,

(2)

electron density mapped with electrostatic potential (ESPM),

(3)

total density of states (DOS) plots, and

(4)

the special distributions of the highest occupied molecular orbitals (HOMOs) and the lowest unoccupied molecular orbitals (LUMOs). Density Functional Theory (DFT) calculations of the investigated Fibrate derivatives utilized the one of the well-known hybrid functionals, Becke,

3 -

parameter, Lee–Yang–Parr (

B 3 L Y P

). In DFT, hybrid functionals incorporate a portion of Hartree-Fock exchange, as well as extra exchange from other sources (empirical/ab initio) to approximate the exchange-correlation energy. The B3LYP as a representation of Hamiltonian term in Schrödinger equation was combined with

6 - 31 G (d, p)

basis set as a representation of eigen-value wavefunction. It is a moderate double zeta (

ζ

) basis set enlarged with two polarization basis functions, a

d -

function for heavy atoms (Carbon, Oxygen, and chlorine), and a p-function for all Hydrogen atoms. Most of the physicochemical properties of the investigated fibrate derivatives discussed in next section were obtained from the frequency calculations carried out at the same level of theory of optimization. Calculations were carried out using Gaussian 09 software suite [52]. Visualizations of molecular structures were performed by using GaussView (version

5.0 . 8

) [53], ESPMs were drawn used the Avogadro package [54], and GaussSum program [55] was used to DOS plots. ESPMs show how electron density is distributed in the four non-planar molecules considering the electrostatic potentials, and this gives information about the region in the molecule that has the highest or lowest electron density, and thus is most likely to be attacked by electrophilic or nucleophilic agents. Keep in mind that the nucleophilic and electrophilic attack regions are represented by blue (positively charged) and red colors (negatively charged). The red color is concentrated on the more electronegative atoms such as Oxygen (deep red) and chlorin atoms (light red), the blue color covered the Hydrogen atoms (the least electronegative atoms), while the Carbon atoms are covered by white color indicating intermediate electronegativity of Carbon atom. Thus, it is possible to determine the position and region in a molecule attacked by an electrophile or nucleophile using ESPMs. Molecule DOS plot indicates how many energy states electrons are allowed to occupy in the system. The HOMO energies of the four investigated Fibrate derivatives are

- 6.230, - 6.108, - 6.166,

and

- 6.422

e V

for Fenofibrate, Ciprofibrate, Bezafibrate, and Clofibrate, respectively. Since, the HOMO energy used as a measure of electron-donating power of a molecule, destabilized HOMO (less negative) leads to more ability to donate electrons. The ability of electron donation of the five derivatives can be arranged as follows: Ciprofibrate > Bezafibrate > Fenofibrate > Clofibrate. On the other hands, the LUMO energy measures the ability of electron accepting of a molecule, more ability combined stabilized LUMO (more negative). Therefore, the derivatives ability to accept electrons is: Fenofibrate

(- 1.720

e V) >

Bezafibrate

(- 1.220

e V) >

Clofibrate

(- 0.461

e V) >

Ciprofibrate

(- 0.457

e V)

. The energy gap (HOMO energy subtracted from LUMO energy) measures the chemical reactivity. Smaller gap is more reactive molecule, the reactivity of the four derivatives is: Fenofibrate

(4.51

e V) <

Bezafibrate

(4.95

e V) <

Ciprofibrate

(5.65

e V) <

Clofibrate

(5.96

e V)

. Finally, the

2 D -

special distribution of HOMO and LUMO orbitals is another indictor of the position/region subjected to electrophilic and nucleophilic attack. The HOMO and LUMO orbitals in Ciprofibrate are distributed on similar parts of molecule, except that the two chlorine atoms have more HOMO character. Other molecules, HOMO orbitals delocalized over different regions compared to the LUMO orbitals distribution.

3. Materials and Method

In this section, the overall objective is to establish a quantitative structure-property-activity (QSPR) relationship between the various topological indices and some physicochemical properties/activity of the Fibrates drugs under study in order to assess the effectiveness of these drugs. Eleven degree-based and four distance topological indices were used for modeling antiviral activity. Based on DMol3-optimized geometries for Fibrates drugs investigated. The version

8.0

of Material Studio from BIOVIA was used to perform DFT calculations, which are as follows: Polarizability

(P)

, Sum of electronic and zero-point Energies

(S E Z_{P} E)

, Sum of electronic and thermal Energies

(S E T E n e r g y)

, Sum of electronic and thermal Enthalpies

(S E T E n t h a l p y)

, Sum of electronic and thermal Free Energies

(S E T F E n e r g y)

, Zero-point vibrational energy

(Z_{P} V E)

, Complexity

(C)

, Topological polar area

(T P A)

, Dipole moment

(D M)

, Heat capacity

(C V)

, Molar entropy

(S)

, and Octanol-water partition coefficients

(X l o g P 3)

of several drugs currently being investigated for the treatment of high cholesterol which includes Fenofibrate, Ciprofibrate, Bezafibrate, Clofibrate drugs. It is possible to use curvilinear regression analysis to fit curves instead of straight lines, SPSS statistical software is used to analyze curvilinear regressions. As described below, the independent variables in the curvilinear regression models are topological indices. Indicators derived from cholesterol-lowering drugs. Based on the equations below, tests are conducted.

\begin{matrix} y = a + b x; & n, R^{2}, F, S e, S F & (Linear equation) \\ y = a + b_{1} x + b_{2} x^{2}; & n, R^{2}, F, S e, S F & (Quadratic equation) \\ y = a + b_{1} x + b_{2} x^{2} + b_{3} x^{3}; & n, R^{2}, F, S e, S F & (Cubic equation) \end{matrix}

In this context, y represents the response or dependent variable, while a denotes the regression model constant, and

b_{i} (i = 1, 2, 3)

refers to the coefficients for each individual descriptor. The independent variable is represented by x, and n signifies the number of samples used in building the regression equation.

R^{2}

denotes the coefficient of determination, R signifies the correlation coefficient, F represents the calculated value of the Fischer

F -

values test,

S e

denotes the standard error of estimate, and

S F

stands for

F -

significance. It should be noted that when the experimental and theoretical results are in close proximity to each other, the correlation coefficient approaches 1. To gauge the predictability of a model, it is necessary to compare the observed values and the model predictions, for which the Root Mean Square Error

(R M S E)

metric is used. The predictive quality of a model is higher when the error or

R M S E

is lower, which is calculated as follows:

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{n}}

(4)

where

y_{i}

is the observed value of the independent variable in the test set,

{\hat{y}}_{i}

is the predicted value of the independent variables in the test set, n is the number of samples in the test topological indices serve as independent variables. To evaluate our initial model, we used the

R M S E

metric and then normalized the data to enhance our predictions’ accuracy. We measured the difference between predicted and actual values using the

R M S E

score, which revealed that our model needed improvement. To address issues such as outliers and varying scales of measurement that could negatively affect model performance, we applied normalization techniques to our data. The normalization step was essential in improving the model’s accuracy, as it scaled variables to a common range, reduced the impact of outliers, and ensured that all variables were weighted equally. After normalization, we re-evaluated the model using the

R M S E

metric, and the updated score showed a significant improvement in our predictions’ accuracy. Computed topological indices values are shown in Table 2. We compute the values using combinatorial computations and edge partitioning as follows: the molecular graph of Fenofibrate has 25 vertices and 26 edges. Its edges can be partitioned as

|E_{1, 4}| = 2,

|E_{1, 3}| = 5,

|E_{2, 3}| = 11,

|E_{2, 2}| = 4,

|E_{3, 4}| = 1,

|E_{3, 3}| = 2,

and

|E_{2, 4}| = 1 .

The molecular graph of Ciprofibrate has 18 vertices and 19 edges. Its edges can be partitioned as

|E_{2, 2}| = 2,

|E_{1, 4}| = 4,

|E_{2, 4}| = 2,

|E_{1, 3}| = 2,

|E_{2, 3}| = 6,

|E_{1, 4}| = 2,

and

|E_{1, 3}| = 1 .

The molecular graph of Bezafibrate has 25 vertices and 26 edges. Its edges can be partitioned as

|E_{1, 3}| = 4,

|E_{2, 3}| = 11,

|E_{2, 2}| = 6,

|E_{3, 3}| = 1,

|E_{2, 4}| = 1,

|E_{3, 4}| = 1,

and

|E_{1, 4}| = 2 .

The molecular graph of Fenofibrate has 16 vertices and 16 edges. Its edges can be partitioned as

|E_{1, 3}| = 2,

|E_{2, 3}| = 6,

|E_{1, 2}| = 1,

|E_{2, 2}| = 3,

|E_{3, 4}| = 1,

|E_{1, 4}| = 2,

and

|E_{2, 4}| = 1 .

Using MATLAB, it is possible to efficiently compute degree-based and distance-based topological indices, as explained in Algorithm 1 and Algorithm 2. To calculate the topological indices of molecules based on distance and degree, MATLAB utilizes various mathematical expressions. The Fibrates family and the drugs under consideration, namely Fenofibrate, Ciprofibrate, Bezafibrate, and Clofibrate, have been studied and are presented in Table 3, including their experimental data [52] and optimized geometries obtained through DFT calculations using the DMol3 module of Version 8.0 of Material Studio from BIOVIA. Table 4 shows the correlation coefficient

(R)

between degree-based topological indices and some physicochemical properties, computed using a linear regression model. Quadratic regression model is used in Table 6 to calculate the correlation coefficient

(R)

between these indices and some physicochemical properties. The cubic model is employed for this purpose in Table 8. Similarly, for the distance-based topological indices, linear, quadratic, and cubic regression models are utilized, and the results are presented in Table 10. Once the correlation coefficient for a physicochemical property is obtained, the model with the maximum R becomes the most accurate predictor of the regression model. This indicated in Table 5, Table 7, Table 9 and Table 11. By leveraging the power of MATLAB, it is possible to efficiently and accurately compute topological indices and use them to predict the physicochemical properties of molecules, which can be incredibly useful in various fields, including drug discovery and materials science.

Algorithm 1 Computational Procedure of calculation of degree-based indices

Input: Edges and nodes of molecule

Output:

e \leftarrow

Topological indices vector

Step 1. Start

Step 2.

G \leftarrow

Graph of undirected edges

Step 3.

A \leftarrow

Adjacency matrix of G

Step 4.

d \leftarrow

Distances of G

Step 5.

d_{1} \leftarrow

Vertex degree of G

Step 6. Calculate size of matrix d

Step 4. Construct

A_{N} :

for

i = 1

to number of columns do

for

j = 1

to number of rows do

if

i = j

then

A_{N} (i, j) = 0

elseif

A (i, j) = 1

then

A_{N} (i, j) = d_{1} (i) + d_{1} (j)

First Zagerb index

A_{N} (i, j) = d_{1} (i) * d_{1} (j)

Second Zagerb index

A_{N} (i, j) = {(d_{1} (i) + d_{1} (j))}^{2}

Hyper Zagerb index

A_{N} (i, j) = \sqrt{\frac{d_{1} (i) + d_{1} (j) - 2}{(d_{1} (i) * d_{1} (j))}}

Atom Bond Connectivity index

A_{N} (i, j) = \frac{1}{\sqrt{d_{1} (i) * d_{1} (j)}}

Randic index

A_{N} (i, j) = \sqrt{\frac{min (d_{1} (i), d_{1} (j))}{max (d_{1} (i), d_{1} (j)}}

min-max rodeg index

A_{N} (i, j) =

\sqrt{\frac{max (d_{1} (i), d_{1} (j))}{min (d_{1} (i), d_{1} (j))}}

max-min rodeg index

A_{N} (i, j) = | d_{1} (i) - d_{1} (j) |

Alberston index

A_{N} (i, j) = {(d_{1} (i) - d_{1} (j))}^{2}

Sigma index

A_{N} (i, j) = \frac{d_{1} (i) . d_{1} (j)}{d_{1} {(i)}^{2} + d_{1} {(j)}^{2}}

Inverse symmetric deg index

A_{N} (i, j) = \frac{d_{1} (i) . d_{1} (j)}{d_{1} (i) + d_{1} (j)}

Inverse sum deg index

end if

end for

Step 5.

e =

(summation of

A_{N}) / 2 .

Algorithm 2 Computational Procedure of calculation of distance-based indices

Input: Edges and nodes of molecule

Output:

e \leftarrow

Topological indices vector

Step 1. Start

Step 2.

G \leftarrow

Graph of undirected edges

Step 3.

A \leftarrow

Adjacency matrix of G

Step 4.

d \leftarrow

Distances of G

Step 5.

d_{1} \leftarrow

Vertex degree of G

Step 6. Calculate size of matrix d

Step 4. Construct

A_{N} :

for

i = 1

to number of rows

- 1

do

a a = 0;

for

j = i + 1

to number of columns do

a a = \sum d (i, j)

Wiener index

a a = \sum d (i, j) * (d_{1} (i) + d_{1} (j))

Schultz index

a a = \sum \frac{1}{d (i, j))}

Harary index

a a = \sum d (i, j) (d_{1} (i) * d_{1} (j))

Gutman index

end for

A_{N} (i) = a a

end for

Step 5.

e =

summation of

A_{N} .

3.1. Results and Discussion

Fibrates drugs are predicted by numerous topological indices. In QSPR, linear, quadratic, and cubic regression models are examined. Several topological indices are calculated for Fibrates drugs, including vertex degree, and distance between vertices. The models are analyzed using twelve descriptors and thirteen topological indices. Using linear regression model a correlation coefficient

(R)

between these indices and some physicochemical properties can be seen in Table 4. In Table 6 using quadratic regression model a correlation coefficient

(R)

between these indices and some physicochemical properties is computed. When a correlation coefficient is obtained for a physicochemical property, the model that has maximum R is the most accurate predictor of the regression model. In Table 4, we display

m a x i m u m (R)

for each physicochemical property, based upon the analysis of the data (linear and quadratic). We have excluded values less than

0.64

from the Table 4, and Table 6, out of convenience.

Table 4. The correlation coefficient (R) obtained by linear regression model between topological indices and physicochemical properties of various drugs of Fibrates.

T.I.	$\begin{matrix} (S E Z_{P} E) \\ (S E T E n e r g y) \\ (S E T E n t h a l p y) \\ (S E T F E n e r g y) \end{matrix}$	$(X L o g P 3)$	$(C)$	$(T P A)$
$M_{1} (ζ)$	$- 0.902$	$0.74$	$1$	$0.811$
$M_{2} (ζ)$	$- 0.941$	$0.729$	$0.995$	$0.786$
$H (ζ)$	$- 0.89$	$0.771$	$0.998$	$0.791$
$A B C (ζ)$	$- 0.84$	$0.748$	$0.992$	$0.826$
$R (ζ)$	$- 0.765$	$0.746$	$0.967$	$0.826$
$m M_{s - d e} (ζ)$	$- 0.914$	$0.833$	$0.985$	$0.705$
$M m_{s - d e} (ζ)$	$- 0.796$	$0.758$	$0.978$	$0.819$
$i r r (ζ)$	$- 0.999$	$0.647$	$0.887$	−
$σ (ζ)$	$- 0.848$	$-$	$-$	$-$
$I S D I (ζ)$	$- 0.669$	$0.736$	$0.922$	$0.805$
$I S I (ζ)$	$- 0.855$	$0.75$	$0.995$	$0.821$

With linear regression models, the following Table 5 illustrates the most appropriate topological index for estimating physicochemical properties. A diagram depicting this is shown in Figure 5.

Table 7 illustrate the best topological index which gives the best estimate for physicochemical properties using quadratic regression models, we only consider topological index with

R^{2} \geq 0.8

. A diagram depicting this is shown in Figure 6.

Table 5. Linear regression models that give the best estimate for physicochemical

Linear regression model	$R^{2}$	F	$S e$	$S F$	$R M S E$
$S E Z_{P} E = - 162.126 - (49.436) i r r (ζ)$	$0.999$	$1747.706$	$9.083$	$0.0005$	$6.4227673$
$X L o g P 3 = 0.226 + (0.128) m M_{s - d e} (ζ)$	$0.639$	$4.522$	$0.595$	$0.167$	$0.4953885$
$C = - 113.023 + (4.545) M_{1} (ζ)$	1	$13088.633$	$1.632$	$0.000076$	$1.1555164$
$T P A = - 8.603 + (3.820) A B C (ζ)$	$0.682$	$4.293$	$16.390$	$0.174$	$8.2595015$
$T P A = - 7.735 + (6.164) R (ζ)$	$0.683$	$4.308$	$11.667$	$0.174$	$8.2495412$

Table 6. The correlation coefficient (R) obtained by quadratic regression model between topological indices and physicochemical properties of various drugs of Fibrates.

T.I.	$D M$	P	$Z_{P} V E$	$C V$	S	$X L o g P 3$	C	$T P A$	$\begin{matrix} S E Z_{P} E \\ S E T_{E n e r g y} \\ S E T_{E n t h a l p y} \\ S E T F_{E n e r g y} \end{matrix}$
$M_{1}$	$0.850$	$0.881$	$0.803$	$0.848$	$0.820$	$0.807$	$1.000$	$0.811$	$0.979$
$M_{2}$	$0.837$	$0.908$	$0.843$	$0.878$	$0.850$	$0.850$	$0.998$	$0.786$	$0.981$
H	$0.808$	$0.930$	$0.872$	$0.904$	$0.879$	$0.852$	$0.999$	$0.804$	$0.973$
$A B C$	$0.874$	$0.884$	$0.752$	$0.808$	$0.779$	$0.768$	$1.000$	$0.829$	$0.981$
R	$0.929$	$0.756$	$-$	$0.714$	$0.684$	$0.746$	$1.000$	$0.831$	$0.993$
$m M_{s - d e}$	$0.868$	$0.851$	$0.760$	$0.815$	$0.787$	$0.768$	$1.000$	$0.832$	$0.979$
$M m_{s - d e}$	$0.746$	$0.997$	$0.984$	$0.990$	$0.979$	$0.964$	$0.990$	$0.796$	$0.971$
$i r r$	$0.894$	$0.995$	$0.998$	$0.999$	$0.999$	$0.983$	$0.893$	$0.712$	$1.000$
$σ$	$0.947$	$0.708$	$-$	$0.667$	$-$	$0.677$	$0.996$	$0.882$	$0.991$
$I S D I$	$-$	$-$	$-$	$-$	$-$	$0.736$	$0.922$	$0.810$	$0.669$
$I S I$	$0.861$	$0.863$	$0.777$	$0.828$	$0.800$	$0.782$	$1.000$	$0.825$	$0.979$

Remark 1.

Initially, linear regression was attempted on all physicochemical properties using degree-based topological indices. Correlation coefficients were calculated for 7 out of 12 properties that showed satisfactory results, as presented in Table 4. For the remaining properties with correlation coefficients less than 0.64, Table 6 explored alternative models. Five additional properties were tested, and if their correlation coefficients exceeded

0.64

, the quadratic regression model was used. Note that some properties, such as Sum of the electronic and zero-point energies

(S E Z_{P} E)

, Sum of the electronic and thermal energies

(S E T_{E n e r g y})

, Sum of the electronic and thermal enthalpies

(S E T_{E n t h a l p y})

, Sum of the electronic and thermal free energies

(S E T F_{E n e r g y})

, have identical correlation coefficients, and only

(S E Z_{P} E)

is listed in Table 5 and Table 7.

Table 7. Quadratic regression model that give the best estimate for physicochemical.

Quadratic regression model	$R^{2}$	F	$S e$	$S F$	$R M S E$
$D M = - 3.085 + (. 171) σ - (0.001) σ^{2}$	$0.897$	$4.360$	$0.473$	$0.321$	$0.4332951$
$P = \begin{matrix} - 1690.487 + (135.897) M m_{s - d e} \\ - (2.374) M m_{s - d e}^{2} \end{matrix}$	$0.995$	$97.646$	$6.116$	$0.071$	$3.057903$
$Z_{P} V E = \begin{matrix} - 2567.209 + (227.158) i r r \\ - (4.547) i r r^{2} \end{matrix}$	$0.996$	$135.526$	$4.387$	$. 061$	$2.193392$
$C V = \begin{matrix} - 937.144 + (82.838) i r r \\ - (1.646) i r r^{2} \end{matrix}$	$0.999$	$339.909$	$1.081$	$0.038$	$0.540583$
$S = \begin{matrix} - 1283.246 + (117.601) i r r \\ - (2.337) i r r^{2} \end{matrix}$	$0.999$	$443.193$	$1.346$	$0.034$	$0.732681$
$X L o g P 3 =$ $0.0763 i r r^{2} - 3.6225 i r r + 45.25$	$0.965$	$6.644$	$0.283$	$0.265$	$0.141424$
$C = \begin{matrix} - 0.0020091 M_{1}^{2} + 4.9546771 M_{1} \\ - 133.0233134 \end{matrix}$	$1.000$	$4059.09$	$2.07$	$0.01$	$1.036262$
$C = \begin{matrix} - 22.6208403 R^{2} + 485.9459637 R \\ - 2, 132.8797476 \end{matrix}$	$1.000$	$8639.90$	$1.42$	$0.01$	$0.710303$
$C = \begin{matrix} - 4.2704046 m M_{s - d e}^{2} + \\ 167.3955975 m M_{s - d e} \\ - 1, 182.6544670 \end{matrix}$	$0.999$	$811.32$	$4.63$	$0.02$	$2.317288$
$C = \begin{matrix} - 0.7828926 I S I^{2} + 55.0686241 I S I \\ - 478.1259314 \end{matrix}$	$1.000$	$3041.59$	$2.39$	$0.01$	$1.297084$
$C = \begin{matrix} 0.0217651 A B C^{2} - 0.9932150 A B C \\ + 161.4302698 \end{matrix}$	$0.999$	$155.21$	$10.58$	$0.06$	$5.291185$
$T P A = - 0.2245 σ^{2} + 22.318 σ - 487.45$	$0.778$	$1.751$	$13.810$	$0.471$	$6.90523$
$S E Z_{P} E = - 0.4068 i r r^{2} - 29.427 i r r - 400.68$	$0.999$	$558.642$	$11.362$	$0.030$	$5.68161$

The cubic model is used for all the physicochemical properties and degree-based topological indices in order to provide a comprehensive analysis. Table 8 presents the correlation coefficients, which are high as anticipated. Table 9 and Figure 7 display the best predictions of the properties.

Table 8. The correlation coefficient (R) obtained by cubic regression model between topological indices and physicochemical properties of various drugs of Fibrates.

T.I.	$\begin{matrix} (S E T_{E n e r g y}) \\ (S E T_{E n t h a l p y}) \\ (S E T F_{E n e r g y}) \\ (S E Z_{P} E) \end{matrix}$	P	C	$T P A$	$X L o g P 3$	S	$D M$	$C V$
$M_{1}$	$0.979$	$0.886$	$1.000$	$0.811$	$0.813$	$0.826$	$0.850$	$0.854$
$M_{2}$	$0.981$	$0.915$	$0.998$	$0.786$	$0.859$	$0.858$	$0.837$	$0.885$
H	$0.973$	$0.939$	$0.999$	$0.806$	$0.863$	$0.890$	$0.808$	$0.914$
$A B C$	$0.981$	$0.846$	$1.000$	$0.829$	$0.769$	$0.782$	$0.874$	$0.810$
R	$0.994$	$0.756$	$1.000$	$0.831$	$0.746$	$0.684$	$0.934$	$0.714$
$m M_{s - d e}$	$0.979$	$0.854$	$1.000$	$0.832$	$0.769$	$0.791$	$0.868$	$0.819$
$M m_{s - d e}$	$0.971$	$0.999$	$0.991$	$0.806$	$0.973$	$0.985$	$0.746$	$0.994$
$i r r$	$1.000$	$0.995$	$0.893$	$0.712$	$0.983$	$0.999$	$0.894$	$0.999$
$σ$	$0.992$	$0.708$	$0.998$	$0.882$	$0.689$	$0.645$	$0.948$	$0.667$
$I S D I$	$0.691$	$/$	$0.923$	$0.970$	$0.997$	$0.690$	$/$	$0.650$
$I S I$	$0.979$	$0.867$	$1.000$	$0.825$	$0.785$	$0.805$	$0.861$	$0.833$

Table 9. Cubic regression model that give the best estimate for physicochemical.

Cubic regression model	$R^{2}$	F	$S e$	$S F$	$R M S E$
$S E Z_{P} E =$ $- 0.407 i r r^{2} - 29.427 i r r - 400.677$	$0.999$	$558.642$	$11.362$	$0.030$	$5.68160$
$P = \begin{matrix} - 0.0584988 M m_{s - d e}^{3} + \\ 2.5776095 M m_{s - d e}^{2} - \\ 1.62895 M m_{s - d e} - 438.67733 \end{matrix}$	$1.000$	$361.397$	$3.185$	$0.037$	$0.00032$
$C = \begin{matrix} - 0.001 M_{1}^{3} + 0.334 M_{1}^{2} - \\ 27.881 M_{1} + 915.803 \end{matrix}$	$1.000$	$4108.744$	$2.060$	$0.011$	$0.06846$
$C = \begin{matrix} 2.043 A B C^{3} - 94.407 A B C^{2} \\ + 1, 457.607 A B C - 7, 177.748 \end{matrix}$	$1.000$	$1315.359$	$3.640$	$0.019$	$0.00024$
$C = \begin{matrix} - 1.502 R^{3} + 18.604 R^{2} \\ + 116.427 R - 1.047 . 001 \end{matrix}$	$1.000$	$4237.196$	$0.641$	$0.003$	$0.00006$
$C = \begin{matrix} 1.637 m M_{s - d e}^{3} - 81.094 m M_{s - d e}^{2} \\ + 1, 340.111 m M_{s - d e} - 7, 031.15 \end{matrix}$	$1.000$	$811.323$	$4.635$	$0.025$	$0.00005$
$C = \begin{matrix} 0.159 I S I^{3} - 11.343 I S I^{2} \\ + 283.834 I S I - 2, 095.506 \end{matrix}$	$1.000$	$3041.588$	$2.394$	$0.013$	$0.00052$
$T P A = \begin{matrix} - 51.090 I S D I^{3} + 1, 488.36 I S D I^{2} \\ - 14, 074.62 I S D I + 42, 794.32 \end{matrix}$	$1.000$	$7.894$	$7.151$	$0.244$	$0.00003$
$S =$ $- 2.337 i r r^{2} + 117.599 i r r - 1283.215$	$0.999$	$443.193$	$1.346$	$0.034$	$0.67175$
$X L o g P 3 =$ $\begin{matrix} - 0.900 I S D I^{3} + 24.127 I S D I^{2} \\ - 210.964 I S D I + 603.459 \end{matrix}$	$0.900$	$85.350$	$0.116$	$0.076$	$0.00001$
$D M = \begin{matrix} - 0.002 σ^{3} + 0.241 σ^{2} \\ - 11.686 σ + 186.850 \end{matrix}$	$1.000$	$4.396$	$0.471$	$0.320$	$0.00612$
$C V = - 1.646 i r r^{2} + 82.849 i r r - 937.277$	$0.999$	$339.909$	$1.081$	$0.038$	$0.54094$

Based on three curvilinear models, linear, quadratic, and cubic, the following Table 10, illustrates the correlation coefficient R for the four distance topological indices. The next Table shows the most accurate prediction of the physicochemical properties based on linear or quadratic models. It should be noted that the physicochemical properties: Sum of the electronic and zero-point energies

(S E Z_{P} E)

, Sum of the electronic and thermal energies

(S E T_{E n e r g y})

, Sum of the electronic and thermal enthalpies

(S E T_{E n t h a l p y})

, Sum of the electronic and thermal free energies

(S E T F_{E n e r g y})

have the same correlation coefficients, which is why the

(S E Z_{P} E)

is the only one listed in Table 10. It is evident that the cubic model is the optimal model to predict all physicochemical properties of Fibrates. Notice that, we displayed the correlation coefficient in bold for the cubic model. Table 11 and Figure 8 illustrated the best linear and quadratic model of distance-based topological indices with the properties.

Table 10. The curvilinear models, along with the linear, quadratic, and cubic regression models, were used to determine the correlation coefficient (R) between the physicochemical properties of various Fibrates drugs and their distance topological indices..

P.P.	${\underset{︸}{W}}_{Linear, Quadratic, cubic}$	${\underset{︸}{S}}_{Linear, Quadratic, cubic}$	${\underset{︸}{H}}_{Linear, Quadratic, cubic}$	${\underset{︸}{G u t}}_{Linear, Quadratic, cubic}$
$D M$	$0.334,$ $0.991, 1$	$0.335,$ $0.989, 1$	$0.465, 0.750, 0.750$	$0.349, 0.997, 1$
P	$0.185, 0.332, 1$	$0.198, 0.321, 1$	$0.166, 0.958, 0.971$	$0.209, 0.383, 1$
$Z_{P} V E$	$0.086, 0.152, 1$	$0.1, 0.144, 1$	$0.042, 0.908, 0.950$	$0.108, 0.205, 1$
$C V$	$0.207, 0.297, 1$	$0.221, 0.292, 1$	$0.177, 0.937, 0.954$	$0.230, 0.345, 1$
S	$0.258, 0.305, 1$	$0.272, 0.306, 1$	$0.220, 0.917$ , $0.936$	$0.280, 0.348, 1$
$S E Z_{P} E$	$0.731, 0.977, 1$	$0.734, 0.976, 1$	$0.807, 0, 950, 0.950$	$0.744, 0.986, 1$
$X L o g P 3$	$0.696, 0.819, 1$	$0.688, 0.819, 1$	$0.789, 0.839, 0.859$	$0.690, 0.793, 1$
C	$0.954, 0.995, 1$	$0.955, 0.996, 1$	$0.979, 0.999, 0.999$	$0.960, 0.998, 1$
$T P A$	$0.859, 0.876, 1$	$0.866, 0.885, 1$	$0.790, 0.854, 0.856$	$0.867, 0.881, 1$

Table 11. The linear and quadratic regression models provide the most accurate predictions for the physicochemical properties.

$\begin{matrix} Linear andQuadratic \\ best regression model \end{matrix}$	$R^{2}$	F	$S e$	$S F$	$R M S E$
$D M = \begin{matrix} - 2.010 + (0.003) G u t \\ - (3.239 E^{- 7}) G u t^{2} \end{matrix}$	$0.994$	$84.508$	$0.113$	$0.077$	$0.9007267$
$P = - 1200.200 + (44.294) H - (0.326) H^{2}$	$0.918$	$5.597$	$24.537$	$0.286$	$12.239175$
$Z_{P} V E = \begin{matrix} - 925.184 + (35.716) H \\ - (0.265) H^{2} \end{matrix}$	$0.824$	$1.252$	$30.377$	$0.534$	$15.16917$
$C V = - 371.268 + (14.228) H - (0.105) H^{2}$	$0.878$	$3.585$	$9.869$	$0.350$	$4.9268046$
$S = - 463.186 + (19.618) H - (0.144) H^{2}$	$0.840$	$2.634$	$16.012$	$0.399$	$7.9943483$
$S E Z_{P} E = \begin{matrix} - 354.281 - (0.591) G u t \\ + (5.778 E^{- 5}) G u t^{2} \end{matrix}$	$0.972$	$17.347$	$63.547$	$0.167$	$36.510223$
$X L o g P 3 = 9.019 - (0.202) H + (0.002) H^{2}$	$0.704$	$1.188$	$0.827$	$0.544$	$0.413171$
$C = - 615.681 + (25.600) H - (0.153) H^{2}$	$0.999$	$475.806$	$6.051$	$0.032$	$3.0262032$
$C = 26.598 + (5.018) H$	$0.958$	$45.340$	$27.142$	$0.021$	$19.193309$
$T P A = 49.861 - (0.008) S + (1.334 E^{- 6}) S^{2}$	$0.783$	$1.799$	$13.664$	$0.466$	$6.9466441$
$T P A = 29.462 + (0.005) G u t$	$0.751$	$6.031$	$10.340$	$0.133$	$7.3111917$

The physicochemical properties of Fibrates drugs and their corresponding degree-based and distance-based topological indices were analyzed using three curvilinear models: linear, quadratic, and cubic. The aim was to determine the most accurate correlation coefficient for the properties studied.

Table 4 shows the correlation coefficients (R) obtained by a linear regression model between various topological indices and physicochemical properties of Fibrates drugs. The topological indices include degree-based topological indices. The results show that the correlation coefficients vary across the different topological indices and physicochemical properties. Positive correlation indicates two variables that tend to move strongly in opposite directions, while negative correlation indicates two variables that move strongly in opposite directions. In particular, for the first Zagreb index

M_{1} (ζ)

the correlation coefficient lies between

0.740

and 1, with the best prediction for complexity

(C)

being 1. For the second Zagreb index

M_{2} (ζ)

the range of the correlation coefficient is

0.729 \leq R \leq - 0.941

which indicates high prediction of all physicochemical properties under study. The highest correlation coefficient values were observed for the

(S E Z_{P} E)

property with values ranging from

0.887

to

0.998

, followed by the

(T P A)

index with values ranging from

0.786

to

0.826

. The other topological indices showed weaker correlations with the physicochemical properties, with correlation coefficients ranging from

0.647

to

0.967

for the remaining indices. Table 5 provided lists five linear regression models and their corresponding

R^{2}

and

R M S E

values.

R^{2}

, or coefficient of determination, is a measure of how well the independent variables in a linear regression model explain the variation in the dependent variable. It ranges from 0 to 1, with 1 indicating a perfect fit.

R M S E

, or root mean squared error, is a measure of how well the regression model’s predictions match the actual values. It represents the average distance between the predicted and actual values, and lower values indicate better accuracy. All five models have relatively high

R^{2}

values, indicating that they explain a significant amount of the variation in the dependent variable. The lowest

R^{2}

value is

0.639

, which is still considered a relatively good fit. However, the models have different levels of prediction accuracy as measured by

R M S E

. The

X L o g P 3

with Min-max rodeg index

m M_{s - d e} (ζ)

index model has the lowest

R M S E

value of

0.495

, which suggests that it has the most accurate predictions among the five models. The C model with first Zagreb index

M_{1}

has the second lowest

R M S E

value of

1.156

, followed by the

S E Z_{P} E

model with an

R M S E

of

6.423

. The

T P A (A B C

index) and

T P A (R

index) models have the highest

R M S E

values of

8.260

and

8.250

, respectively, indicating that their predictions are the least accurate among the five models. In summary, while all five models have relatively high

R^{2}

values indicating good fit to the data, the

X L o g P 3

model is the most accurate based on its low

R M S E

value, followed by the C and

S E Z_{P} E

models, and then the

T P A

(

A B C

index) and

T P A

(R index) models, which have the highest

R M S E

values.

Table 6 presents the correlation coefficients

(R)

obtained by a quadratic regression model between topological indices and physicochemical properties of various drugs of Fibrates. Upon analyzing the data in Table 6, several noteworthy findings can be observed. Firstly, many of the correlation coefficients

(R)

are relatively high, indicating a strong linear relationship between the topological indices and physicochemical properties of the Fibrates drugs. For instance,

σ (ζ)

has a high correlation coefficient of

0.947

with

(D M)

, indicating a strong positive linear relationship between these two variables. Similarly,

M m_{s - d e} (ζ)

has a high correlation coefficient of

0.997

with

(P)

, suggesting a strong positive linear relationship between these variables as well. Furthermore, some of the correlation coefficients are close to 1, indicating a perfect positive linear relationship between the variables. For example,

M_{1},

A B C,

R,

m M_{s - d e},

and

I S I

indices have a correlation coefficient of

1.000

with

(C)

, suggesting a perfect positive linear relationship between these two variables. Similarly,

i r r

index has a correlation coefficient of

1.000

with

S E Z P E

,

S E T_{E n e r g y}

,

S E T_{E n t h a l p y}

, and

S E T F_{E n e r g y}

, indicating a perfect positive linear relationship between these variables. On the other hand, some correlation coefficients are relatively low, indicating a weak linear relationship between the variables. For instance,

I S D I

index has a correlation coefficient less than 0.64 for most of the properties exept for

(C)

(R = 0.922)

and

(T P A)

(R = 0.882)

, suggesting a weak positive linear relationship between these two variables. It is also interesting to note that we don’t have any negative values which would indicating an inverse relationship between the variables. In addition, some of the correlation coefficients are moderate, suggesting a moderate linear relationship between the variables. For instance,

(T P A)

has a correlation coefficient of

(0.712 \leq R \leq 0.882)

, indicating a moderate positive linear relationship between these variables. Overall, the findings from Table 6 suggest that there are varying degrees of linear relationships between the topological indices and physicochemical properties of Fibrates drugs. Some of the relationships are strong, while others are weak or moderate. Looking at Table 7, we see that all five models for Complexity property

(C)

have high

R^{2}

values, with the lowest being

0.999

and the highest being

1.000

. This suggests that all five models are good at explaining the variation in the physicochemical property they are modeling. The second thing to consider is the

R M S E

value, a lower

R M S E

value indicates that the model has a better fit. In this table, we can see that the

R M S E

values range from

0.710303

to

5.291185

. The model with the lowest

R M S E

value is the second model:

C = - 22.6208403 R^{2} + 485.9459637 R - 132.8797476

for the Randic index. This indicates that this model has the best fit for estimating the physicochemical property. However, it is important to note that all five models have high

R^{2}

values, suggesting that they all provide good estimates for the physicochemical property. After analyzing the table, we found that there are five quadratic regression models with both high

R^{2}

values and low

R M S E

values. The quadratic regression model for S has a high

R^{2}

value of

0.999

and a low

R M S E

value of

0.732681

, making it one of the best models in terms of accurately predicting the target variable. The other models are for

(Z_{P} V E)

,

(C V)

,

(S E Z_{P} E)

, and

(P)

. The model for

(Z_{P} V E)

has an

R^{2}

value of

0.996

and an

R M S E

of

2.193392

, the model for

(C V)

has an

R^{2}

value of

0.999

and an

R M S E

of

0.540583

, the model for

(S E Z_{P} E)

has an

R^{2}

value of

0.999

and an

R M S E

of

5.68161

, and the model for

(P)

has an

R^{2}

value of

0.995

and an

R M S E

of

3.057903

. These models can be considered the best in terms of their ability to fit the data and accurately predict the target variable.

Table 8 presents the correlation coefficient

(R)

obtained by cubic regression models between topological indices and physicochemical properties of various drugs of fibrates. Looking at the table, we can see that the range of correlation coefficient varies for each row. For instance, the correlation coefficient for the row of the first Zagreb index

(M_{1})

ranges from

0.811

to

1.0

, while for the row Inverse symmetric deg index

(I S D I)

, the correlation coefficient ranges from

0.650

to

0.970

. Overall, most of the correlation coefficients are relatively high, with many of them being close to

1.0

. This suggests a strong correlation between the topological indices and the physicochemical properties of the drugs of fibrates. The high correlation coefficients could indicate that the topological indices could be used to predict the physicochemical properties of the drugs with high accuracy. Based on the Table 9, it appears that the cubic regression model provides the highest correlation coefficients for most of the topological indices and physicochemical properties of Fibrates drugs. The range of correlation coefficients for each row varies, but in general, they are relatively high, indicating a strong relationship between the topological indices and physicochemical properties. Furthermore, the high correlation coefficients suggest that the cubic regression model is an effective tool for predicting physicochemical properties based on the topological indices of Fibrates drugs. Overall, the results of the table suggest that the cubic regression model is the best choice for analyzing the relationship between topological indices and physicochemical properties in Fibrates drugs. based on Table 9, we can analyze the four topological indices with respect to high

R^{2}

and minimum

R M S E

.

(X L o g P 3)

(

R M S E = 0.00001

,

R^{2} = 0.900

) indicating a strong correlation between the physicochemical properties and this index. Additionally, its

R M S E

value of

0.00001

is also very low, suggesting that the predicted values using this index are very close to the actual values.

(T P A)

(

R M S E = 0.00003

,

R^{2} = 1.000

) indicating a perfect correlation with the physicochemical properties.

By deep looking at Table 10, considering only the distance-based topological indices, we can notice that the model which gives the highest correlations with all the investigated physicochemical properties of Fibrate drugs is the cubic model. Since the correlation coefficients range from

0.750

to

1.000

. In the second place is the quadratic model, since it gives good correlations with most of these properties, the correlation coefficients range from

0.750

to

0.999

. While the linear model comes in the third place, shows good correlation but with the least number of properties, the correlation coefficients range

0.688

to

0.979

. An important note, in most cases, that the linear and quadratic models give comparable correlation coefficients, while there is a significant improvement in the correlation coefficients when the cubic model is used for most of properties. For instance, for the polarizability

(P)

property estimated using wiener index, correlations are comparable,

R = 0.185

and

R = 0.322

for the linear and quadratic models, respectively, and it improves to 1 with the cubic model. As a result, we should consider our model type when dealing with such properties. Generally speaking, the four properties at the end of Table 10 are estimated very well with the three models compared to the first five properties in the table. The complexity

(C)

property can be best estimated using the various models, since the correlations with each model reach

\sim 1

. The topological polar area

(T P A)

can be nominated as the second-best estimated property by the three models, followed by Sum of electronic and zero-point Energies

(S E Z_{P} E)

property. Conversely, the zero-point vibrational energy

(Z_{P} V E)

and heat capacity

(C V)

properties seems to be the least properties which can be estimated correctly using the two models (linear and quadratic), the correlations not exceeded

0.345

, the exception is the quadratic model of the hyper Zagreb index

H (ζ)

,

R = 0.824

and

0.937

, respectively. Based on the

R M S E

values given in Table 11, the three best predictors with the lowest

R M S E

values are: Linear Regression

(D M = - 2.010 + (0.003) G u t)

with

R M S E = 0.9007,

Quadratic Regression

(P = - 1200.200 + (44.294) H - (0.326) H^{2})

with

R M S E = 12.2392,

and Curvilinear Regression

(X L o g P 3 = 9.019 - (0.202) H + (0.002) H^{2})

with

R M S E = 0.4131 .

These three regression models exhibit the lowest

R M S E

values, indicating higher accuracy and better predictive performance compared to the other regression models. Therefore, these three regression models, namely linear, quadratic, and curvilinear, can be considered as the best predictors for enhancing the analysis of fibrates drug activity through molecular descriptors in this study. Therefore, based on the results obtained, it can be concluded that the cubic and quadratic regression models are the top predictors for the physicochemical properties analyzed in this investigation, as they exhibit both high

R^{2}

values and minimum

R M S E

values simultaneously. These findings highlight the effectiveness of these regression models in enhancing the analysis of fibrates drug activity through molecular descriptors and provide valuable insights for future research in this area.

4. Conclusion

Based on our comprehensive analysis, we have demonstrated that the use of curvilinear regression models can significantly enhance the analysis of fibrates drug activity through molecular descriptors. Our results have revealed that these models have superior predictive power compared to linear regression models, especially when the underlying data exhibits nonlinear relationships. Furthermore, the incorporation of molecular descriptors as independent variables has substantially improved the accuracy and robustness of the models. Our findings have several important implications for the field of drug discovery and development. Firstly, the use of curvilinear regression models, in conjunction with molecular descriptors, can facilitate the identification and optimization of more potent and selective drugs, thus reducing the time and cost associated with drug development. Secondly, our study underscores the importance of considering nonlinear relationships between molecular descriptors and drug activity, which has traditionally been overlooked in conventional linear regression analyses. Lastly, the efficacy of curvilinear regression models and molecular descriptors in predicting drug activity may be extended to other drug classes and further elucidated through future studies. In summary, our investigation demonstrates that curvilinear regression models represent a powerful approach for analyzing drug activity, particularly when coupled with molecular descriptors. Our results provide a basis for the development of improved drug discovery pipelines and offer insights into the molecular mechanisms governing drug activity.

Author Contributions

Conceptualization, S.W. and N.U.O., methodology, S.W., validation, S.W. and N.U.O., formal analysis, S.W., investigation, S.W., resources, N.U.O., data curation, N.U.O., writing—original draft S.W., preparation, S.W., writing—review and editing, S.W.and N.U.O., supervision, S.W., project administration, S.W., funding acquisition, S.W. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

The article contains the data that supported the study’s findings.

Acknowledgments

This research work was funded by Institutional Fund Projects under grant no. (IFPIP: 214-247-1443). The authors gratefully acknowledge technical and financial support provided by the Ministry of Education and King Abdulaziz University. DSR, Jeddah, Saudi Arabia. The authors acknowledge Nuha Wazzan from Chemistry department at King Abdulaziz University for her contribution with the DFT calculations and King Abdulaziz University’s High-Performance Computing Centre (Aziz Supercomputer) (http://hpc.kau.edu.sa) for supporting the computation for the work described in this paper.

Conflicts of Interest

The authors declare that they have no conflict of interest.

References

Gonzalez-Diaz, H. , Vilar, S., Santana, L. and Uriarte, E., 2007. Medicinal chemistry and bioinformatics-current trends in drugs discovery with networks topological indices. Current topics in medicinal chemistry, 7(10), pp.1015-1029.
Estrada, E. and Uriarte, E., 2001. Recent advances on the role of topological indices in drug discovery research. Current Medicinal Chemistry, 8(13), pp.1573-1588. [CrossRef]
Gao, W. , Wang, W. and Farahani, M.R., 2016. Topological indices study of molecular structure in anticancer drugs. Journal of chemistry, 2016. [CrossRef]
Gao, W. , Farahani, M.R. and Shi, L., 2016. Forgotten topological index of some drug structures. Acta medica mediterranea, 32(1), pp.579-585.
P.A. McCullough, and M.J. Di Loreto, 2012. "Fibrates and cardiorenal outcomes." Journal of the American College of Cardiology, 60(20), pp.2072-2073. [CrossRef]
Brea, A. , Millán, J., Ascaso, J.F., Blasco, M., Díaz, A., Hernández-Mijares, A., Mantilla, T., Pedro-Botet, J.C. and Pintó, X., 2018. Fibrates in primary prevention of cardiovascular disease. Comments on the results of a systematic review of the Cochrane Collaboration. Clínica e Investigación en Arteriosclerosis (English Edition), 30(4), pp.188-192.
J. Devillers, and A. T. Balaban, eds. Topological Indices and Related Descriptors in QSAR and QSPAR (CRC Press, Boca Raton, 2000).
I. Gutman, “A Property of the Simple Topological Index,” MATCH Communications in Mathematical and in Computer Chemistry 25 (1990): 131–40.
H. Wiener, “Structural Determination of Paraffin Boiling Points,” Journal of the American Chemical Society 69, no. 1 (1947): 17–20. [CrossRef]
W. Gao, Y. Wang, B. Basavanagoud, and M. K. Jamil, “Characteristics Studies of Molecular Structures in Drugs,” Saudi Pharmaceutical Journal 25, no. 4 (2017): 580–6. [CrossRef]
T. Doslic, T. Reti, and A. Ali, “On the Structure of Graphs with Integer Sombor Indices,” Discrete Mathematics Letters 7 (2021): 1–4.
I. Gutman, “Geometric Approach to Degree-Based Topological Indices: Sombor Indices,” MATCH Communications in Mathematical and in Computer Chemistry 86 (2021): 11–6.
Ediz, S. , Çiftçi, İ., Cancan, M. and Farahani, M.R., 2021. "On k-total distance degrees and k-total Wiener polarity index". Journal of Information and Optimization Sciences, 42(7), pp.1469-1477.
M. Mateji c, E. Zogi c, E. Milovanovi c, and I. Milovanovi c, “A Note on the Laplacian Resolvent Energy of Graphs,” Asian-European Journal of Mathematics 13, no. 06 (2020): 2050119. [CrossRef]
H. Wiener, “Structural Determination of Paraffin Boiling Points,” Journal of the American Chemical Society 69, no. 1 (1947): 17–20. [CrossRef]
I. Gutman, and N. Trinajst ıC, “Graph Theory and Molecular Orbitals. Total p-Electron Energy of Alternant Hydrocarbons,” Chemical Physics Letters 17, no. 4 (1972): 535–8. [CrossRef]
M. Randić, “On Characterization of Molecular Branching,” Journal of the American Chemical Society 97, no. 23 (1975): 6609–15. [CrossRef]
E. Estrada, “Characterization of 3D Molecular Structure,” Chemical Physics Letters 319, no. 5-6 (2000): 713–8. [CrossRef]
H. Hosoya, “Topological Index. A Newly Proposed Quantity Characterizing the Topological Nature of Structural Isomers of Saturated Hydrocarbons,” Bulletin of the Chemical Society of Japan 44, no. 9 (1971): 2332–9. [CrossRef]
E. Estrada, and D. Bonchev, Chemical Graph Theory (New York: Chapman and Hall/CRC, 2013).
I. Gutman, B. Ruscic, N. Trinajstic, and C. F. WilsonJr., “Graph theory and molecular orbitals. XII. Acyclic polyenes,” The Journal of Chemical Physics, vol. 62, no. 9, pp. 3399–3405, 1975.
G. H. Shirdel, H. Rezapour, and A. M. Sayadi, “The hyper Zagreb index of graph operations,” Iranian Journal of Mathematical Chemistry, vol. 4, pp. 213–220, 2013.
Togan, M. , Yurttas, A., Cevik, A.S., and Cangul, I.N., 2019. "Effect of edge deletion and addition on Zagreb indices of graphs". In Mathematical Methods in Engineering (pp. 191-201). Springer, Cham.
Togan, M. , Yurttas, A., Çevik, A.S., and Cangul, I.N., 2019. "Zagreb indices and multiplicative Zagreb indices of double graphs of subdivision graphs". TWMS Journal of Applied and Engineering Mathematics, 9(2), pp.404-412.
Gutman, I. , Togan, M., Yurttas, A., Cevik A. S., and Cangul I.N., “Inverpe problem fsr sigma index,” MATCH Communications in Mathematical and in Computer Chemistry", vol. 79, pp. 491–508, 2018.
M. Ghorbani, S. Zangi, and N. Amraei, “New results on symmetric division deg index,” Journal of Applied Mathematics and Computing, vol. 65, pp. 161–176, 2021. [CrossRef]
D. Vukiccevi’c and M. Gasparov, “Bond additive modeling 1. Adriatic indices,” Crica Chemica Actata.vol. 83, pp. 243–260, 2010.
Richardson, C.W. , Foster, G.R. and Wright, D.A., 1983. Estimation of erosion index from daily rainfall amount. Transactions of the ASAE, 26(1), pp.153-0156. [CrossRef]
Das, K.C. , Gutman, I. and Furtula, B., 2011. On atom-bond connectivity index. Chemical Physics Letters, 511(4-6), pp.452-454. [CrossRef]
Dalfó, C. , 2019. On the Randić index of graphs. Discrete Mathematics, 342(10), pp.2792-2796.
Jahanbani, A. , 2019. Albertson energy and Albertson Estrada index of graphs. Journal of Linear and Topological Algebra, 8(01), pp.11-24.
Klavžar, S. , Rajapakse, A. and Gutman, I., 1996. The Szeged and the Wiener index of graphs. Applied Mathematics Letters, 9(5), pp.45-49. [CrossRef]
Xu, K. and Das, K.C., 2011. On Harary index of graphs. Discrete applied mathematics, 159(15), pp.1631-1640.
Mukwembi, S. , 2012. On the upper bound of Gutman index of graphs. Match-Communications in Mathematical and Computer Chemistry, 68(1), p.343.
O. Ç. Havare, “Topological indices and QSPR modeling of some novel drugs used in the cancer treatment,” International Journal of Quantum Chemistry, vol. 121, no. 24, Article ID e26813, 2021. [CrossRef]
D. Vukicevic, “Boad additime modeling 2. Mathematicpl properties mf max-mrn rodig index,” Crica Chemica Actata.vol. 83, no. 3, pp. 261–273, 2010.
E. Estrada, L. Torres, L. Rodrıguez, and I. Gutman, “An atombond connectivity index: modelling the enthalpy of formation of alkanes,” Indian Journal of Chemistry, vol. 37, pp. 849–855, 1998.
G. V. Rajasekharaiah and U. P. Murthy, “Hyper-Zagreb indices of graphs and its applications,” Journal of Algebra Combinatorics Discrete Structures and Applications, vol. 8, no. 1, pp. 9–22, 2020. [CrossRef]
D. Vukiccevic and M. Gasparov, “Bond additive modeling 1. Adriatic indices,” Crica Chemica Actata.vol. 83, pp. 243–260, 2010.
Ö. Çolakŏglu Havare, “Determination of some thermodynamic properties of monocarboxylic acids using multiple linear regression,” BEU Journal of Science, vol. 8, no. 2, pp. 466–471, 2019.
Lokesha, V. , Shruti, R. and sinan CEVIK, A., 2018. On certain topological indices of Nanostructures using QG and RG operators. Communications Faculty of Sciences University of Ankara Series A1 Mathematics and Statistics, 67(2), pp.178-187.
T. Reti, R. T. Reti, R. Sharafdini, A. Dregelyi-Kiss, and H. Haghbin, “Graph irregularity indices used as molecular descriptors in QSPR studies,” MATCH Communications in Mathematical and in Computer Chemistry, vol. 79, pp. 509–524, 2018.
Wiener, H. , 1948. Relation of the physical properties of the isomeric alkanes to molecular structure. Surface tension, specific dispersion, and critical solution temperature in aniline. The Journal of Physical Chemistry, 52(6), pp.1082-1089. [CrossRef]
Dobrynin, A.A. , Entringer, R. and Gutman, I., 2001. Wiener index of trees: theory and applications. Acta Applicandae Mathematica, 66(3), pp.211-249. [CrossRef]
Castro, E.A. and Tueros, M., 2001. QSPR Study of boiling points of alkyl alcohols via improved polynomial relationships. Philippine Journal of Science, 130(2), pp.111-118.
Delen, S. , Khan, R.H., Kamran, M., Salamat, N., Baig, A.Q., Naci Cangul, I. and Pandit, M.K., 2022. Ve-Degree, Ev-Degree, and Degree-Based Topological Indices of Fenofibrate. Journal of Mathematics, 2022.
Liu, X. , Chen, W, Gao, H., & Shi, Y. (2021). QSPR models for predicting the densities and viscosities of biodiesel using topological indices. Symmetry, 13(4), 544. [PubMed]
Zuo, J. , & Hu, L. (2020). QSPR modeling of the melting points of organic compounds using molecular topology and quantum chemical descriptors. Symmetry, 12(7), 1104.
Zhang, Y. , Li, H., Liu, Y., & Zhou, P. (2019). QSPR models for predicting melting points of organic compounds based on molecular topology. Symmetry, 11(1), 25. [PubMed]
Naghipour, S. , & Kiasat, A. R. (2019). Application of topological indices in QSPR modeling of C60 derivatives’ fullerene-like behavior. Symmetry, 11(3), 368.
Wang, J. , & Xu, L. (2018) QSPR models for predicting the boiling points of alkyl alkanes based on the novel vertex degree valence topological index. Symmetry, 10(7), 282. [PubMed]
Frisch, M.J. , Gaussian 09 Programmer’s Reference. 2009, Gaussian.
Roy Dennington, T. Keith, and J. Millam, GaussView, S. Mission, Editor. 2009, Semichem Inc.: KS.
Hanwell, M.D. , et al., Avogadro: an advanced semantic chemical editor, visualization, and analysis platform. Journal of Cheminformatics, 2012. 4(1): p. 17.
O’boyle, N.M., A. L. Tenderholt, and K.M. Langner, Cclib: a library for package-independent computational chemistry algorithms. Journal of computational chemistry, 2008. 29(5): p. 839-845. [CrossRef]

Figure 1. Fenofibrate: (1) optimized geometries, (2) ESPM, (3) DOS plots, and (4) HOMOs and LUMOs.

Figure 2. Ciprofibrate: (1) optimized geometries, (2) ESPM, (3) DOS plots, and (4) HOMOs and LUMOs.

Figure 3. Bezafibrate: (1) optimized geometries, (2) ESPM, (3) DOS plots, and (4) HOMOs and LUMOs.

Figure 4. Clofibrate: (1) optimized geometries, (2) ESPM, (3) DOS plots, and (4) HOMOs and LUMOs.

Figure 5. Plots of Linear Regression Equations for the Best Physicochemical Properties Predicted by Degree-based Topological Indices.

Figure 6. Plots of Quadratic Regression Equations for the Best Physicochemical Properties Predicted by Degree-based Topological Indices.

Figure 7. Plots of Cubic Regression Equations for the Best Physicochemical Properties Predicted by Degree-based Topological Indices.

Figure 8. Plots of Linear and Quadratic Regression Equations for the Best Physicochemical Properties Predicted by Distance-based Topological Indices.

Table 1. The mathematical expressions of topological indices.

Vertex-degree-based topological indices	Mathematical expression
First Zagreb index	$M_{1} (ζ) = \sum_{u v \in E (ζ)} (d (u) + d (v))$
Second Zagreb index	$M_{2} (ζ) = \sum_{u v \in E (ζ)} (d (u) \cdot d (v))$
Hyper Zagreb index	$H M (ζ) = \sum_{u v \in E (ζ)} {(d (u) + d (v))}^{2}$
Atom bond connectivity index	$A B C (ζ) = \sum_{u v \in E (ζ)} \sqrt{\frac{d (u) + d (v) - 2}{d (u) \cdot d (v)}}$
Randić index	$R (ζ) = \sum_{u v \in E (ζ)} \frac{1}{\sqrt{d (u) \cdot d (v)}}$
Max-min rodeg index	$M m_{s - d e} (ζ) = \sum_{u v \in E (ζ)} \sqrt{\frac{max (d (u), d (v))}{min (d (u), d (v))}}$
Min-max rodeg index	$m M_{s - d e} (ζ) = \sum_{u v \in E (ζ)} \sqrt{\frac{min (d (u), d (v))}{max (d (u), d (v))}}$
Albertson index	$i r r (ζ) = \sum_{u v \in E (ζ)} \|d (u) - d (v)\|$
Sigma index	$σ (ζ) = \sum_{u v \in E (ζ)} {(d (u) - d (v))}^{2}$
Inverse symmetric deg index	$I S D I (ζ) = \sum_{u v \in E (ζ)} \frac{d (u) \cdot d (v)}{d {(u)}^{2} + d {(v)}^{2}}$
Inverse sum indeg index	$I S I (ζ) = \sum_{u v \in E (ζ)} \frac{d (u) \cdot d (v)}{d (u) + d (v)}$
Distance-based topological indices	Mathematical expression
Wiener index	$W (ζ) = \sum_{{u, v} \subseteq V (ζ)} d (u, v)$
Schultz index	$S (ζ) = \sum_{{u, v} \subseteq V (ζ)} (d (u) + d (v)) d (u, v)$
Harary index	$H (ζ) = \sum_{\{u, v\} \subseteq V (ζ)} \frac{1}{d (u, v)}$
Gutman index	$G u t (ζ) = \sum_{{u, v} \subseteq V (ζ)} (d (u) \cdot d (v)) d (u, v)$

Table 2. Values of topological indices in Fibrates’ molecular structures.

Topological index	Fenofibrate	Ciprofibrate	Bezafibrate	Clofibrate
$M_{1 (ζ)}$	126	98	124	76
$M_{2} (ζ)$	143	115	139	84
$H (ζ)$	626	520	606	374
$A B C (ζ)$	$19.1$	$14.12$	$19.03$	$11.78$
$R (ζ)$	$11.68$	$8.22$	$11.77$	$7.45$
$m M_{s - d e} (ζ)$	$20.44$	$14.19$	$20.864$	$12.33$
$M m_{s - d e} (ζ)$	$34.7$	$26.95$	$33.9693$	$21.79$
$i r r (ζ)$	30	28	28	20
$σ (ζ)$	54	60	50	38
$I S D I (ζ)$	$10.92$	$7.5704$	$11.12$	$6.61$
$I S I (ζ)$	$28.59$	$21.4952$	$28.34$	$17.01$
$W (ζ)$	1716	660	1882	468
$S (ζ)$	6872	2652	7600	1776
$H (ζ)$	$87.5476$	$55.1468$	$84.5541$	$45.5162$
$G u t (ζ)$	6846	2638	7650	1670

Table 3. The physicochemical properties of potential drugs of Fibrates.

Physicochemical properties	Fenofibrate	Ciprofibrate	Bezafibrate	Clofibrate
$(D M)$	$3.98025$	$3.94641$	$3.01127$	$2.19815$
$(P)$	$164.27567$	$244.49533$	$232.43367$	$144.46$
$(S E Z_{P} E)$	$- 1649.62662$	$- 1535.54779$	$- 1551.61567$	$- 1151.954$
$(S E T E n e r g y)$	$- 1649.60875$	$- 1535.52308$	$- 1551.59139$	$- 1151.93749$
$(S E T E n t h a l p y)$	$- 1649.6078$	$- 1535.52214$	$- 1551.59044$	$- 1151.93749$
$(S E T F E n e r g y)$	$- 1649.67518$	$- 1535.60609$	$- 1551.6753$	$- 1152.00027$
$(Z_{P} V E)$	$155.46481$	$231.67184$	$225.46799$	$157.25221$
$(C V)$	$66.502$	$92.538$	$91.009$	$61.172$
$(S)$	$141.803$	$176.701$	$178.604$	$134.118$
$(X L o g P 3)$	$5.2$	$3.4$	$3.8$	$3.3$
$(C)$	458	333	452	232
$(T P A)$	$52.6$	$46.5$	$75.6$	$35.5$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Exploring the Symmetry of Curvilinear Regression Models for Enhancing the Analysis of Fibrates Drug Activity through Molecular Descriptors

Abstract

Keywords:

Subject:

1. Introduction

2. DFT Part

3. Materials and Method

3.1. Results and Discussion

4. Conclusion

Author Contributions

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe