Exploring Routes to Enhance the Calculation of Free Energy Differences via Nonequilibrium Work SQM/MM Switching Simulations by Using Hybrid Charge Intermediates Between MM and SQM Level of Theory or Non-linear Switching Schemes

Andreas Schoeller; H Lee Woodcock; Stefan Boresch

doi:10.20944/preprints202303.0483.v1

Submitted:

27 March 2023

Posted:

28 March 2023

You are already at the latest version

Abstract

Nonequilibrium work switching simulations and Jarzynski’s equation are a reliable method to compute free energy differences, ΔAlow→high, between two levels of theory, such as a pure molecular mechanical (MM) and a quantum mechanical/molecular mechanical (QM/MM) description of a system of interest. Despite the inherent parallelism, the computational cost of this approach can quickly get very high. This is particularly true for systems where the core region, the part of the system to be described at different levels of theory, is embedded in an environment, such as explicit solvent water. We find that even for relatively simple solute–water systems, switching lengths of at least 5 ps are necessary to compute ΔAlow→high reliably. In this study, we investigate two approaches towards an affordable protocol, with emphasis on keeping the switching length well below 5 ps. Inserting a hybrid charge intermediate state with modified partial charges which resembles the charge distribution of the desired high level makes it possible to obtain reliable calculations with 2 ps switches. Attempts using step-wise linear switching paths, on the other hand, did not lead to improvement, i.e., faster convergence for all systems. To understand these findings, we analyzed the solutes’ properties as a function of partial charges used, the number of waters in direct contact with the solute, and studied the time needed for water molecules to reorient themselves upon a change in the solute’s charge distribution.

Keywords:

free energy

;

indirect thermodynamic cycle

;

non-equilibrium simulation

Subject:

Engineering - Energy and Fuel Technology

1. Introduction

Since the first applications reported shortly before 1990 [1,2], so-called alchemical free energy simulations (FES) have become an essential tool of computational chemists in academia and industry alike [3,4,5,6]. The free energy difference determines the spontaneity of chemical or biochemical processes and, hence, makes it possible to predict, e.g., binding affinities of potential drugs [7,8,9]. In addition to computationally estimating an important macroscopic, thermodynamic quantity, FES can also provide insight in the microscopic origins [10]. One source of error is the accuracy with which interactions within and between molecules are modeled in the underlying molecular dynamics simulations. In certain applications, force-field based descriptions may be insufficient [11]. Even if one is not interested in studying chemical reactions, the neglect of induced electronic polarization in classical force fields may lead to problems even for seemingly simple applications, such as computing relative free energies of hydration of mono- or bivalent ions [12,13,14]. In such situations, hybrid quantum-mechanical/molecular mechanical (QM/MM) Hamiltonians are called for. However, even when using only semiempirical quantum chemical methods (henceforth abbreviated as SQM) for the core region, i.e., the part of the system one is particularly interested in, the computational cost can quickly become prohibitive. Further, several tricks frequently used in alchemical FES do not work with SQM/MM Hamiltonians [15]. Early on, Warshel, Gao, and others pioneered the use of indirect cycles, also referred to as “multilevel” free energy simulations, as depicted in Figure 1a), which has become a widely used way to carry out FES with a (S)QM/MM description of interactions [16,17,18].

To illustrate the indirect approach, we consider the calculation of the aqueous solvation free energy of a solute X. As can be seen in Figure 1a, the quantity of interest, the solvation free energy difference

Δ A_{X g a s \to s o l v}^{S Q M / M M}

at the high SQM/MM level of theory (dashed, black arrow in Figure 1a), can also be obtained in three steps according to

Δ A_{X g a s \to s o l v}^{S Q M / M M} = - Δ A_{X g a s}^{M M \to S Q M} + Δ A_{X g a s \to s o l v}^{M M} + Δ A_{X s o l v}^{M M \to S Q M / M M}

(1)

In addition to computing the solvation free energy

Δ A_{X g a s \to s o l v}^{M M}

at the force field (MM) level of theory (solid black arrow), two correction steps, accounting for the free energy difference of X in gas phase and aqueous solution between the two levels of theory (green and red arrows), are required. The development in this field during recent years shows that there is not only enormous potential but also a high interest in the FES community to use this strategy to compute free energy differences at the SQM/MM level of theory [19,20,21,22,23,24,25,26,27].

Work by Heimdal and Ryde [28] made clear that the challenging step of indirect cycle SQM/MM FES is the calculation of the low-to-high corrections

Δ A_{X g a s}^{M M \to S Q M}

and

Δ A_{X s o l v}^{M M \to S Q M / M M}

, the green and red arrows of Figure 1a. Various approaches have been used, often requiring sophisticated decomposition of the cycle and the employment of fitting procedures to be successful [18,29]. In the past, we showed that the corrections

Δ A^{M M \to S Q M}

can be computed reliably and accurately by non-equilibrium work (NEW) methods [30,31,32,33,34,35]. In our work to date, computationally expensive protocols were used, which are unsuited for most real-world applications. Recently, we optimized the NEW protocols regarding switching length and number of switches and presented a sufficiently efficient workflow for practical applications [36]. However, these optimized protocols were tested so far only in the gas phase, i.e., the green arrow of Figure 1a (

Δ A_{X g a s}^{M M \to S Q M}

).

Here, we investigate whether the protocols suggested in Ref. 36 are also suitable for aqueous solution, i.e., when the region of interest to be treated at a high level of theory, e.g., SQM, interacts with an MM environment (

Δ A_{X s o l v}^{M M \to S Q M / M M}

, red arrow in Figure 1a). In earlier work, we had noted that NEW switching simulations from a pure MM to a SQM/MM level of theory converged more slowly compared to gas phase transformations involving only MM to SQM and observed that the charge distribution resulting from the fixed charges of the force-field and the average charge distribution obtained at the SQM/MM level of theory were noticeably different [31]. We speculated that the necessary reorientation of water in response to such a change in charge distribution of the solute might be responsible for the slower convergence. Using equilibrium methods to compute

Δ A_{X s o l v}^{M M \to S Q M / M M}

, Ito and Cui reported that inserting an intermediate state with fixed charges more similar to the charge distribution at the SQM/MM target state improved convergence considerably [37].

The solvent-reorientation dynamics of water in response to charge changes is experimentally well characterized and understood [38,39]. Two relaxation times, one faster than 0.5 ps, the other approximately 2 ps, can be discerned. Therefore, if the charge distribution of the region of interest is significantly different at the two levels of theory, NEW switching simulations of only 2 ps, as used in Ref. 31, may be too short to obtain converged results. With this in mind, we tested two strategies: (i) Similarly to the work by Ito and Cui, we introduce an intermediate state, which is still purely MM, but with the atomic partial charges modified to resemble the average charge distribution of the SQM description at the high level of theory. (ii) Since water reorientation in response to a change in charge distribution occurs on two timescales, the linear switching protocols we have used so far may be suboptimal. In particular, we test whether protocols in which switching is carried out initially at a faster rate, e.g., switching from the MM to a 35% SQM/MM description in just 10% of the switching length, followed by a second slower phase during the remaining 90% of the switching length, might lead to improved convergence. Obviously, both strategies can be combined. Model calculations exploring these two approaches are complemented by a series of analyses in which we investigate the effects of a solute’s charge distribution on properties such as its dipole moment, as well as on the interactions with surrounding waters.

The focus of this work is on the effect which the charge distribution of a solute described at different levels of theory has on the convergence of calculating

Δ A_{X s o l v}^{M M \to S Q M / M M}

. To avoid unrelated complications from too flexible molecules, where conformational preferences might be different at the low and high level of theory, respectively, we mostly chose rather rigid model compounds for the test calculations. Rather than using (a subset of) the so-called “HiPen” test set [34], we selected tautomeric pairs (cf. Methods). Most of the compounds are rather rigid without rotatable bonds. Therefore, the pure MM and the SQM/MM descriptions of the systems — the solute (tautomer) is described by SQM, all waters remain classical — differed primarily by the charge distributions of the solute at the two levels of theory.

The remainder of the manuscript is organized as follows. In Theory and Methods, we summarize the theoretical background, introduce the model systems, and provide the technical details of all simulations and analyses carried out. In Results, we first study the convergence of NEW switching simulations using the optimized protocol from our preceding study [36] (switching lengths of 2 ps, 200 NEW switches per transformation). We then investigate how convergence

Δ A_{X s o l v}^{M M \to S Q M / M M}

improves when employing switching lengths of 5 ps, which, however, may be too costly for many applications. Next, we present results using the two mitigation strategies outlined above. Our findings are rationalized by additional data characterizing the differences in solute properties, in particular the dipole moment, at the two levels of theory, as well as details about the solvent-reorientation dynamics of water under the simulation conditions.

2. Materials and Methods

2.1. Computing free energy differences between levels of theory by NEW methods

In several previous studies [30,31,32,33,34,35,36] we demonstrated that NEW methods are a robust means to compute free energy differences between levels of theory, such as an MM and SQM/MM description of a system. Given a series of non-equilibrium work

W^{M M \to S Q M / M M}

values obtained from switching simulations, the free energy difference is computed according to Jarzynski’s equation [40]

Δ A^{M M \to S Q M / M M} = - k_{B} T ln 〈exp (- β W^{M M \to S Q M / M M})〉

(2)

where

k_{B}

, T and

β

have the usual meaning of Boltzmann’s constant, absolute temperature and

β = 1 / k_{B} T

. The angular brackets indicate averaging over a sufficiently large number of individual work values. To realize this scheme, one has to save snapshots during an equilibrium MD simulation at the MM level of theory. These serve as the starting point for the switching simulations using a hybrid energy function

U (λ_{t}) = (1 - λ_{t}) U^{M M} + λ_{t} U^{S Q M / M M}

(3)

Here

0 \leq λ_{t} \leq 1

is the coupling parameter which is incremented, e.g., linearly at each step of the switching simulation according to

λ_{t} = λ_{t} (i) = i / N_{s w i t c h}, i = 1, 2, \dots, N_{s w i t c h}

(4)

In our protocols,

N_{s w i t c h}

is, e.g., 2,000 or 5,000 steps, corresponding to a switching length of 2 or 5 ps with a timestep

δ t = 1

fs, and we carry out two hundred of such switches (see below for details). Eq. 4 describes the simplest switching protocol; in this work, we also test modified protocols where

λ_{t}

is incremented at different speeds. The switching simulations are costly because the evaluation of the mixed potential energy function

U (λ_{t})

requires the high-level energy function

U^{S Q M / M M}

and its gradient. However, since the switches are independent, they can be carried out in parallel. For each timestep at which

λ_{t}

is incremented, work is accumulated according to (cf. Ref. [41])

W (t + δ t) = W (t) + U (x_{t + δ t}, λ_{t + δ t}) - U (x_{t + δ t}, λ_{t})

(5)

One-sided methods, such as Jarzynski’s equation, are less efficient and reliable than two-sided approaches [42]. The process outlined above to obtain

W^{M M \to S Q M / M M}

can be reversed, so one ends up with distributions

P^{M M \to S Q M / M M} (W)

of work values in the forward and

P^{S Q M / M M \to M M} (W)

in the backward direction. Crooks showed that these two distributions are related according to [43]

P^{M M \to S Q M / M M} (W) = P^{S Q M / M M \to M M} (- W) exp [β (W - Δ A^{M M \to S Q M / M M})]

(6)

from which one can deduce the free energy difference between the two states, i.e., between the two levels of theory. The notation in Eq. 6 has been adapted to the present application. The practical use of Crooks’ equation is computationally costly, since one needs to generate equilibrium configurations at the high level of theory. As in the past, Crooks’ equation will, therefore, be only used to generate reference results to assess the convergence of free energy differences obtained by Jarzynski’s equation. In the following, Jarzynski’s and Crooks’ equations (or their use) are abbreviated as JAR and CRO, respectively.

The mixing of Hamiltonians as required by Eq. 3 was realized with the MSCALE module of CHARMM [44], which makes it possible to combine energies and forces from different sources, including separate programs, in an almost arbitrary manner. The MSCALE functionality is coupled to one of CHARMM’s tools (PERT) to compute free energy differences [45]. Realizing that short/fast slow-growth thermodynamic integration calculations yield a non-equilibrium work rather than a converged free energy differences [46], one can exploit PERT’s slow-growth mode to carry out NEW switches. Using PERT, it is also possible to carry out stepwise linear switching protocols; see Stepwise linear switching protocols below.

2.2. Choice of model systems

This study focuses on the optimization of the calculation of

Δ A_{X s o l v}^{M M \to S Q M / M M}

(red arrow in Figure 1a), in other words, the free energy difference between a system, in which all interactions are described by classical mechanics, and a hybrid system, in which a small region of interest is described by SQM, and the remainder described by MM. Examples are solute–solvent systems, in which the solute is modeled either by molecular mechanics or by quantum chemistry, whereas the solvent (water) is always treated classically. As outlined earlier, difficulties may arise from the differences in charge distribution of the MM and SQM representation of the solute molecule X. To exclude other factors hindering convergence, suitable model systems should be relatively rigid and not have multiple conformational substates.

Many tautomer pairs fulfil this requirement. We, therefore, picked 7 such pairs from the tautomer part of SAMPL2 challenge [47]. One additional pair was taken from Tautobase [48]. We also included N-acetyl-alanine-methylamide (Ala) and N-acetyl-serine-methylamide (Ser) as model solutes, since these were the two compounds for which we noticed slow convergence in Ref. [31]. All model compounds used in this work are shown in Figure 2.

The prediction of tautomer preferences in aqueous solution remains a challenging problem [49] and is out of scope for this study. Here we concentrate on finding efficient protocols to compute the correction

Δ A_{X s o l v}^{M M \to S Q M / M M}

, the red arrow in Figure 1. Since (S)QM/MM methods are likely necessary to answer questions involving tautomer preferences, the present work establishes the necessary methodology to enable such calculations efficiently.

2.3. Strategies to improve convergence of NEW simulations

2.3.1. Hybrid charge intermediates

As outlined in the Introduction, cf. Figure 1b, we investigated whether the quantity of interest

Δ A_{X s o l v}^{M M \to S Q M / M M}

can be computed more efficiently by inserting an intermediate state. We refer to it as “MULL”, the rationale for this label will become clear shortly. Thus, we have

Δ A_{X s o l v}^{M M \to S Q M / M M} = Δ A_{X s o l v}^{M M \leftrightarrow M U L L} + Δ A_{X s o l v}^{M U L L \to S Q M / M M}

(7)

This intermediate state should have a charge distribution resembling that of the SQM/MM end state. To be useful in practice, the free energy difference between the initial MM and the intermediate state

Δ A_{X s o l v}^{M M \leftrightarrow M U L L}

must be easy (=fast) to compute. In practice, this means that the interactions at the intermediate state must also be classical. The obvious choice, therefore, is to use the force field of the initial MM state with suitably modified partial charges.

In all our methodological work to date [30,31,32,33,34,35,36], we used the SCC-DFTB SQM method [50,51] as the high level of theory. On the one hand, SCC-DFTB is expected to be sufficiently similar to ab initio QM methods so that insights obtained will be transferable to full DFT QM methods. On the other hand, SCC-DFTB is fast enough to explicitly carry out simulations at the high level of theory, something which is not feasible for ab initio QM methods. The ability to carry out simulations at the SQM/MM level of theory makes it possible to obtain rigorous reference results using Crooks’ equation; cf. the previous section.

Since Mulliken charges and Mulliken charge analysis are at the core of the SCC-DFTB methodology [50,51], a logical choice for the intermediate state is the combination of the MM force field (see below for details) with fixed partial charges derived from the Mulliken charges used at the SCC-DFTB level of theory; hence the superscript MULL in Figure 1b and Eq. 7. In SCC-DFTB calculations, the converged Mulliken charges change at every simulation step; therefore, an averaging procedure is needed. There are several possibilities how to obtain a representative sample of configurations, for which Mulliken charges are computed. The following three approaches are tested in this study: (i) Clearly, the most exact method is to use simulations at the high level of theory (either just SCC-DFTB for the gas phase, or the solute treated by SCC-DFTB, surrounded by MM waters) to obtain configurations for which the Mulliken charges are calculated and then averaged. All simulations and results using this intermediate state are labelled MULL(solv*). Our goal, however, is to avoid simulations at the high level of theory as much as possible. Therefore, we also used configurations sampled at the MM level of theory to derive Mulliken charges for the intermediate representation. Specifically, we used (ii) the Mulliken charges obtained from snapshots of the MM gas phase simulation of the solute (labelled as MULL(gas)), or (iii) used snapshots from the corresponding MM simulations in water to obtain Mulliken charges for the solute (labelled as MULL(solv)).

The three sets of MULL charges were obtained by averaging over configurations saved during equilibrium MD simulations, i.e., MULL(gas) charges were derived from the MM gas phase simulations, MULL(solv) from the MM simulations in solutions, and MULL(solv*) from SQM/MM simulations in solution. These are the simulations needed to save restart files for the NEW switching simulations to the respective other level of theory. As described below (cf. Simulation details), we always carried out 8 repetitions, i.e., 8 MD simulations per state and system. From each of these, 25 configurations were taken and reevaluated at the SQM(/MM) level of theory, writing out the converged Mulliken charges. Thus, each MULL charge set was obtained by averaging over 200 configurations.

If one inserts one of the MULL intermediate states to compute

Δ A^{M M \to S Q M / M M}

as described above, then one also must compute

Δ A_{X s o l v}^{M M \leftrightarrow M U L L}

to close the thermodynamic cycle (cf. Figure 1b). Since the MM and the MULL states differ only in the partial atomic charges, we calculated

Δ A_{X s o l v}^{M M \leftrightarrow M U L L}

by equilibrium free energy methods. Specifically, in addition to the end states we used three equidistant intermediate states, and calculated the free energy difference by Bennett’s acceptance ratio method (BAR) [52].

2.3.2. Stepwise linear switching protocols

All protocols employed to switch from MM to SQM/MM are quite rapid, so the system will not remain in equilibrium. In the context of slow-growth thermodynamic integration, this has been referred to as Hamiltonian lag and shown to lead to poorly converged results [46,53,54]. Although JAR [40] makes it possible to obtain the equilibrium free energy difference for a process from multiple irreversible work values, the convergence of the results depends on the variance of the work values, which in turn depends on the deviation of the switching trajectories from equilibrium conditions [41,55].

In addition to inserting an intermediate state with partial charges resembling the high level of theory, as just described, we explored the use of modified switching protocols. Results from solvation spectroscopy [38,39] indicate that the faster process of water reorientation in response to a change in charge distribution of a solute is completed after approximately 0.5 ps. We, therefore, tested piecewise linear switching protocols consisting of two or three stages, referred to as L2-X and L3-X, respectively. The three protocols investigated in detail are shown in Figure 3, which also includes the default linear protocol L1 of Eq. 4 (leftmost plot). In all cases, the total switching length

τ

is 2,000 fs. In L2-1, instead of incrementing

λ_{t}

linearly, there are two stages: first,

λ_{t}

is incremented from

λ_{t} = 0.00 \to 0.35

in 200 fs, followed by slower linear switching from

λ_{t} = 0.35 \to 1.00

over 1,800 fs. The two L3-X protocols use three stages. In L3-1,

λ_{t}

is switched from 0.0 to 0.35 to 0.8 to 1.0 over 200, 900, and 900 fs, respectively. Finally, L3-2 is very similar, but

λ_{t}

is switched from 0.0 to 0.35 to 0.8 to 1.0 over 200, 400, and 1,400 fs, respectively. In all modified protocols,

λ_{τ}

is initially changed more rapidly compared to L1, followed by a slower change in

λ_{τ}

during the remainder of the switch.

To verify the applicability and performance of the modified compared to the standard linear switching protocol, systematic tests were carried out for five compounds (1-t2, 4-t2, 5-t2, 6-t2 and 8-t1). This selection was motivated by the results obtained with the linear protocol and includes both “easy” and “difficult” solutes with respect to convergence (cf. Results, Performance of 2 and 5 ps linear switching protocols in solution).

2.3.3. Analyses carried out

The main focus of this study lies on computing

Δ A_{X s o l v}^{M M \to S Q M / M M}

efficiently, and most of our results are concerned with the accuracy of JAR-based protocols compared to reference results obtained with CRO. Additional analyses and characterizations were carried out to understand how solute–solvent interactions change and, thus, affect convergence when switching from an MM to a SQM representation of the solute.

Characterizing charge distributions

As described in Hybrid charge intermediates, we use three intermediate charge distributions to facilitate the transformation from the low to the high level of theory. It is, therefore, of interest to quantify the differences between the charge sets used. In the following, the reference charge set is always the one derived from averaging over the Mulliken charges obtained from SQM/MM trajectories [MULL(solv*)]. To quantify the difference between two charge sets, we computed the root-mean-square deviation RMSDq, defined as

RMSDq = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(q_{i}^{m e t h o d} - q_{i}^{M U L L (s o l v *)})}^{2}}

(8)

Here, N is the number of atoms of the molecule, the

q_{i}^{M U L L (s o l v *)}

charges serve as the reference, and the

q_{i}^{m e t h o d}

are one of the other charge sets used, i.e., method can be MM, MULL(gas), or MULL(solv).

Because all our solutes are neutral, the most important quantity affected by a change in partial charges is the dipole moment

\vec{μ}

. To compare the dipole moments resulting from the various charge sets, we consider the “differential” dipole moment between two charge distributions of a solute, i.e,

Δ \vec{μ} = \sum_{i = 1}^{N} (q_{i}^{m e t h o d} - q_{i}^{M U L L (s o l v *)}) \cdot \vec{r_{i}}

(9)

In practice, we mostly compare the length of this vector, i.e.,

|Δ \vec{μ}| = Δ μ

.

Furthermore, the angle between the dipole moment vectors of two charge representations is of interest, which can be obtained in the usual way as

Θ_{SQM} = {cos}^{- 1} (\frac{{\vec{μ}}^{m e t h o d} \cdot {\vec{μ}}^{M U L L (s o l v *)}}{| {\vec{μ}}^{m e t h o d} | \cdot | {\vec{μ}}^{M U L L (s o l v *)} |})

(10)

Dipole moments were computed with the COOR DIPOle command of CHARMM; 1 if the difference between two charge sets is assigned as partial charges, one obtains the corresponding differential dipole moment.

Characterization of the first solvation shell

We analyzed all MM, MULL(gas), MULL(solv), MULL(solv*), and SQM/MM equilibrium trajectories to determine the average number of solvent waters in proximity to the solute. Specifically, a water molecule was considered to be in close contact if the distance between the water oxygen and a non-hydrogen atom of the solute was

\leq 3

Å. These analyses were carried out with the atom selection facility of CHARMM 2.

Dynamics of solvent reorientation

To investigate the detailed dynamics of solvent reorientation upon changing the description of the solute from the MM to the SQM level of theory, we resort to ideas from computational spectroscopy, specifically the computation of the normalized Stokes shift

S (t)

from non-equilibrium MD simulations.

S (t)

is defined as [56,57]

S (t) = \frac{ν (t) - ν (\infty)}{ν (0) - ν (\infty)} \approx \frac{\bar{Δ U (t)} - \bar{Δ U (\infty)}}{\bar{Δ U (0)} - \bar{Δ U (\infty)}}

(11)

where

ν (t)

is the time-dependent frequency of the emitted fluorescence light of the probe molecule’s chromophore. In computation, one assumes that changes in interactions between the solute and the solvent leading to

ν (t)

are purely electrostatic in nature. Thus, one takes equilibrated configurations from solute–solvent simulations, changes the partial charges from those describing the ground state to those describing the excited state, restarts the simulations, and monitors the difference in electrostatic solute–solvent interactions between the ground (

U^{0}

) and the excited state (

U^{*}

),

Δ U (t) = U^{*} (t) - U^{0} (t)

, as a function of time. Just as the measured fluorescence signal is an average of the emissions from all the chromophores present in solution, one averages over at least a few hundred simulations, hence the bar in

\bar{Δ U (t)}

.

In the context of our work, the low- and high-level representations of the solute (region of interest) play the role of the ground and excited states in solvation spectroscopy. In both cases, solvent water has to reorient itself following a change in charge distribution of the solute. We, therefore, adopted the computational approach to compute

S (t)

[56,57] as follows. We used starting configurations equilibrated either at (i) the MM level of theory or (ii) the MULL(solv) intermediate state, and restarted simulations with the full SQM/MM description of interactions. For each system studied (4-t2,5-t2,6-t2 and 8-t2), we computed 800 simulations of 10 ps length; based on the experimental findings it is reasonable to expect that solvent reorientation is completed after this time [38,39]. At each timestep, we saved

Δ U (t) = U^{S Q M / M M} (t) - U^{m e t h o d} (t)

, where method was either MM or MULL(solv). This task was facilitated by a locally modified version of CHARMM, but could equally well be accomplished through post-processing of trajectories saved during the simulations. The 800

Δ U (t)

time series were then averaged, resulting in the averaged time series

\bar{Δ U (t)}

. Next, we estimated

\bar{Δ U (\infty)}

, i.e., the limit of

t \to \infty

, by averaging (again) over the last 2,000 entries of

\bar{Δ U (t)}

(

8 ps \leq t \leq 10 ps

).

Eq. 11 is the definition of the normalized Stokes shift, whereas in the present context omitting the normalization turned out to be advantageous. Thus, we operate with the averaged time series,

\bar{Δ U (t)} = \bar{Δ U (t)} - \bar{Δ U (\infty)},

(12)

which essentially is an un-normalized Stokes shift. To interpret it more easily,

\bar{Δ U (t)}

was fitted to a mono-exponential target function

\bar{Δ U (t)} \propto Δ U_{0} \cdot e^{- t / τ}

(13)

where

Δ U_{0}

is the prefactor of the exponential decay at

t = 0

, t is the time in fs, and

τ

the relaxation time constant in fs. As described in the Introduction, it is known experimentally that orientation of solvent water occurs on two timescales [38,39]. Because any effects on the convergence of JAR calculations will result from the slower process, we intentionally always discarded the first 0.5 ps when carrying out the fit. Thus, the

τ

found by the fitting procedure should correspond to the slower reorientation process.

2.4. Overview of simulations carried out

All simulations were carried out with CHARMM (developmental versions c44a2 and c47a1) [45]. The calculation of

Δ A_{s o l v}^{M M \to S Q M / M M}

requires equilibrium simulations to generate the configurations from which the NEW switching simulations are started. As described above, in addition to switching directly from the MM to the SQM/MM level of theory, we also inserted three force field-based intermediate states with different atomic partial charges. For each of these states, we also had to carry out equilibrium simulations, followed by NEW switches to the SQM/MM level of interest. The NEW switching simulations themselves were carried out either in a strictly linear fashion (protocol L1 in Figure 3), or with a two- or three-staged linear protocol. To make sure that any convergence problems observed do not arise without solvent, we also carried out gas phase simulations for all compounds shown in Figure 2 using the optimized protocol of Ref. [36].

2.4.1. Simulation details

Preparation and initial equilibration

For each molecule shown in Figure 2, starting coordinates were generated using CHARMM-GUI [58,59]. Missing force field parameters were generated using CGenFF 2.4.0 [60,61,62] as invoked by CHARMM-GUI. The solute–solvent systems were set up by placing each molecule into a cubic box of water molecules with an initial side-length of 26 Å containing 572 (CHARMM modified) TIP3 waters [63]. Any water molecule overlapping with a solute atom was deleted. Each system was then equilibrated as follows: 100 steps of steepest descent minimization were followed by a constant pressure/temperature (CPT) simulation of 100 ps length with a timestep of 1 fs applying the Langevin piston barostat [64] (mass of pressure piston 400 Da, Langevin piston collision frequency 20

{ps}^{- 1}

, Langevin piston bath temperature 300 K). The final 20 ps of these equilibration runs were used to determine the average box size. In Table S1 of SI, we list the size of the simulation box determined in this manner, as well as the number of water molecules present, for each of the compounds studied.

Force field-based equilibrium simulations

Starting from the initial solute coordinates and the equilibrated solute-solvent systems, 8 Langevin dynamics (LD) simulations (timestep 1 fs, friction coefficient 5

{ps}^{- 1}

) were carried out in the gas phase and in solution, respectively. Each of these simulations was initialized with different random velocities drawn from a Maxwell-Boltzmann distribution at 300 K. In each gas phase run, 5 ns of simulation time were discarded as equilibration, followed by 10 ns of production, during which restart files were saved every 1,000^th step. In the analogous solution simulations at constant volume, 0.5 ns were discarded as equilibration. During the subsequent 1 ns production phase, restart files were saved every 1,000^th step. Thus, during the cumulative simulation length of 8 ns (solution)/80 ns (gas phase), 8,000 (solution)/80,000 (gas phase) restart files were saved. These served as the pool of configurations sampled in the canonical ensemble, from which non-equilibrium switching simulations to the high (SQM(/MM)) level of theory were started.

The solute molecules were always fully flexible; the TIP3 waters were held rigid by SHAKE [65]. In the gas phase, the nonbonded interactions were not truncated (“infinite” cutoff radius). In the solution simulations, Lennard-Jones interactions were smoothly switched off between 10 and 12 Å, and electrostatic interactions were computed with the particle-mesh-Ewald method [66] (

κ = 0.34

Å^-1, spline interpolation to order 6, FFT grid size of 32).

The protocol just described was used both for the simulations at the MM level of theory, i.e., using the CGenFF force field, and for all simulations with a hybrid intermediate representation (force field with Mulliken derived partial charges).

SQM(/MM) equilibrium simulations

To calculate reference values via Crook’s equation, we also carried out simulations at the SQM(/MM) level of theory. Specifically, the solute was treated with the self-consistent-charge density-functional tight-binding (SCC-DFTB) method as implemented in CHARMM [67], using the 3ob-3-1 [68,69,70,71] parameter set.3 Waters were always treated classically. Analogously to the MM case just described, eight LD simulations started with different random initial velocities were carried out in the gas phase and in solution. The simulation length in all cases was 1 ns (1 million steps with a timestep of 1 fs). Restart files were written every 1,000^th step in solution and every 100^th step in the gas phase, thus resulting in a total of 800 (solution)/8,000 (gas phase) restart files generated during a cumulative simulation length of 8 ns. Nonbonded interactions were treated identically as in the MM case.

2.4.2. Non-equilibrium work simulations

From each set of restart files written during the equilibrium simulations at the MM, MULL, and SQM(/MM) levels of theory, every 40^th was selected as the starting point for a NEW switch to the respective other level of theory. Thus, 200 switches per molecule were carried out to compute the free energy difference between two levels of theory. In all switching simulations, we used a timestep of 1 fs. Unless otherwise noted, the switching length was 2000 fs (2,000 steps), both for linear and stepwise linear switching paths.

2.4.3. Calculation of $Δ A_{X s o l v}^{MM \leftrightarrow MULL}$ :

A described in Hybrid charge intermediates, to close the thermodynamic cycle, one needs to compute

Δ A_{X s o l v}^{MM \leftrightarrow MULL}

. Since the only difference between the

M M

and

M U L L

descriptions of a system are the partial charges of the solute,

{q_{i}}

, intermediate states can be set up easily according to

q_{i} (λ) = (1 - λ) q_{i}^{M M} + λ q_{i}^{M U L L}

. To ensure overlap between neighboring states, we used three intermediate states (

λ = 0.25, 0.5, 0.75

) in addition to the MM (

λ = 0

) and MULL (

λ = 1

) end states. All simulation settings were the same as described in Force field-based equilibrium simulations; the production length per state was 1 ns. At each

λ

state, 1,000 coordinate sets were saved and energies were reevaluated at the neighboring states. The free energy difference between neighboring states was computed with BAR; these were summed up to give

Δ A_{X s o l v}^{MM \leftrightarrow MULL}

. All calculations were repeated 8 times, starting from initial random velocities. The standard deviation obtained from these 8 repetitions were used to estimate the standard error by Gaussian error propagation. All MM calculations were carried out with the OpenMM [72] GPU acceleration available in CHARMM 4.

2.5. Overview of calculated free energy differences and paths

In Results, several approaches to calculate the free energy difference between two levels of theory are compared. To specify the method and the exact meaning of a free energy difference, we adopt the following labelling scheme:

Δ A_{X p h a s e}^{level of theory I \to / \leftrightarrow level of theoy II}

Here the subscript X indicates the compound; the qualifier phase can either be “gas” (gas phase) or “solv” (solvated phase). In the superscript, level of theory I and level of theory II can be “MM”, “MULL(gas)”, “MULL(solv)”, “MULL(solv*)”, or “SQM(/MM)”. MM indicates the use of the regular force field. The abbreviation SQM indicates that the SCC-DFTB semi-empirical method was used to compute interactions, either for the full system (gas phase), or for the solute (aqueous solution, waters were always described classically). The labels MULL(gas), MULL(solv), and MULL(solv*) refer to three methods to derive average Mulliken charges described in Hybrid charge intermediates above.

A simple arrow → stands for a one-sided method to compute the free energy difference; in this work, this always means JAR. The double-pointed arrow ↔ indicates the use of a two-sided method. As described earlier, we used CRO to compute reference results. Furthermore, the free energy differences between MM and the intermediate state with modified partial charges, i.e., MULL(gas), MULL(solv), or MULL(solv*), were computed using BAR.

As in our previous work, all results are reported relative to a reference result; i.e., instead of reporting, e.g.,

Δ A_{X p h a s e}^{M M \to S Q M}

, we report the corresponding double free energy difference

δ Δ A_{X p h a s e} = Δ A_{X p h a s e}^{M M \to S Q M} - Δ A_{X p h a s e}^{M M \leftrightarrow C R O_{R e f}}

. In the past, reference results were obtained with CRO based on 200 2 ps forward and backward NEW switches. As described further in Results, we decided to use CRO results obtained from 5 ps switches as reference(

{CRO}_{Ref}

). The exact definition of

δ Δ A_{X p h a s e}

depends on whether the free energy difference between two levels of theory is computed in a single step, i.e.,

δ Δ A_{X p h a s e} = Δ A_{X p h a s e}^{MM \to / \leftrightarrow SQM} - Δ A_{X p h a s e}^{{CRO}_{Ref}}

(14a)

or via an hybrid charge intermediate state. Here, we have

δ Δ A_{X p h a s e} = Δ A_{X p h a s e}^{MM \leftrightarrow MULL} + Δ A_{X p h a s e}^{MULL \to SQM} - Δ A_{X p h a s e}^{{CRO}_{Ref}}

(14b)

to account for the correction step between the MM representation and the hybrid Mulliken charge intermediate state; cf. Eq. 7.

3. Results

3.1. Performance of 2 and 5 ps linear switching protocols in solution

Before carrying out calculations in aqueous solution, the optimized JAR protocol from Ref. [36] was tested for all compounds in the gas phase. All results can be found in Table S3 of SI. Except for 1-t1 (see further down),

|δ Δ A|

was

< 0.1

kcal/mol. While

δ Δ A (1 - t 1) \approx 0.35

kcal/mol is slightly larger than the ideal maximum deviation of

\pm 0.25

kcal/mol, this value is still acceptable for most practical applications. Thus, the protocol recommendations of our previous study [36] are applicable to the model compounds studied here. Based on the gas phase results, any noticeable deterioration of

δ Δ A

, therefore, has to be caused by differences in the description of solute–solvent interactions at the two levels of theory.

In Figure 4, the

δ Δ A

values for all compounds obtained by three protocols/methods are compared against reference results obtained from 200 forward/backward switches of 5 ps length: 200 MM to SQM switches of 2 ps length, the recommended protocol from Ref. [36], (JAR(2ps), red circles), 200 MM to SQM switches of 5 ps length (JAR(5ps), orange circles), and CRO results obtained from 200 forward/backward switches of 2 ps length (CRO(2ps), blue circles). The data visualized in Figure 4 are tabulated in Table S2; even more detailed results can be found in Tables S4 and S5. For the shortest protocol, JAR(2ps), four results lie outside the

\pm 0.25

kcal/mol threshold indicated by the two dashed lines in the figure. Taking error bars into account (see Table S2), the results for 1-t1 and 8-t2 can be considered acceptable,

|δ Δ A| < 0.4

kcal/mole; furthermore,

δ Δ A

(1-t1) is comparable to the value obtained in the gas phase (see Table S3). However, for 5-t2 and 6-t2 the deviation from the reference result is larger than 1 kcal/mol and would lead to a sizable systematic error.

The use of a 5ps switching length (JAR(5ps), orange circles) certainly helps, but one poor result, 6-t2,

δ Δ A \approx 0.8

kcal/mol, remains. Thus, the relatively inexpensive protocol suggested in Ref. [36] cannot be recommended for calculations in solution. Even worse, the use of more costly switching simulations of 5 ps length does not help in all cases.

Figure 4 also shows results for the CRO(2ps) protocol (blue circles). Compared to our findings in the gas phase [36], the differences to the reference result CRO(5ps) are noticeably more pronounced. Furthermore, as can be seen in Tables S4 and S5, the statistical uncertainty of the CRO(2ps) results was consistently higher than those of the CRO(5ps) results, even though the overlap between forward and backward work distributions seemed adequate in all cases. While all CRO(2ps)

δ Δ A

values lie within the

\pm 0.25

kcal/mol threshold, we, nevertheless, decided to choose CRO(5ps) as the reference protocol.

Three of the four compounds failing the

\pm 0.25

kcal/mol threshold using the JAR(2ps) protocol, 5-t2, 6-t2, and 8-t2, are lactams. Interestingly, the corresponding tautomeric lactim states do not cause any problems. The pair 1-t1/1-t2 belongs to the keto-enol class of tautomerism. For 1-t1

δ Δ A \approx 0.35

kcal/mol in both the gas phase and in aqueous solution; therefore, different conformational preferences at the two levels of theory may be responsible for the poor convergence when using switching lengths of 2 ps.

3.2. Performance of hybrid charge intermediates

In Figure 5 we summarize the performance of the indirect NEW switching protocols using a hybrid intermediate charge state in terms of

δ Δ A

, cf. Eq. 14b. As described in Methods, three approaches to obtain average Mulliken charges were tested: averaging over gas phase configurations (MULL(gas), over configurations sampled in aqueous solution at the MM level of theory (MULL(solv)), and over configurations sampled in aqueous solution at the SQM/MM level of theory (MULL(solv*)). The results obtained with the direct JAR(2ps) protocol (red circles), already shown in Figure 4 and henceforth referred to as just MM, are included as well.

Clearly, all three protocols employing an intermediate hybrid charge state perform much better than the direct NEW switches (MM), though not to the same degree. Using the MULL(gas) intermediate state (orange triangles), the

\pm 0.25

kcal/mol threshold for 5-t2 is still missed,

δ Δ A = 0.32

kcal/mol, and the result for Ala becomes worse (

δ Δ A = - 0.49

kcal/mol). For MULL(solv), blue squares, de facto all results lie within the

\pm 0.25

kcal/mol threshold (

δ Δ A (5 - t 2) = - 0.28

kcal/mol,

δ Δ A (Ser) = - 0.26

kcal/mol). It should be noted that taking the statistical uncertainty of the results into account, none of these deviations from the

\pm 0.25

kcal/mol threshold is statistically significant. Finally, all MULL(solv*) results (green diamonds) lie well within the

\pm 0.25

kcal/mol threshold.

Figure 5 shows that in most cases the use of a hybrid intermediate charge state lowers

δ Δ A

, and typically the use of MULL(solv*) leads to the best results, followed, in this order, by MULL(solv) and MULL(gas). To quantify this, we report in Table 1 the mean absolute deviation (MAD) for the data shown in Figure 5, together with the spread of

δ Δ A

for each of the protocols. Already the use of MULL(gas) lowers the MAD from 0.29 to 0.12 kcal/mol, the largest error

δ Δ A

reduces from 1.80 to 0.49 kcal/mol. The MULL(solv) intermediate state leads to a further improvement (

MAD = 0.09

, the largest error is

0.28

kcal/mol). Finally, the MAD for MULL(solv*) is 0.07 kcal/mol and the largest

δ Δ A

is as low as 0.20 kcal/mol.

The improvements resulting from the hybrid intermediate charge states are roughly inversely proportional to the computational effort to obtain the average Mulliken-like charges. The MULL(gas) and MULL(solv) charges are obtained from configurations at the MM level of theory; the required computations at the SQM(/MM) level of theory to obtain the Mulliken charges are slightly more costly for the solvated system. By contrast, to obtain the MULL(solv*) charges, sampling needs to be carried out at the high (SQM/MM) level of theory. Given that the improvements of using MULL(solv*) rather than MULL(solv) are relatively small, this leads to an additional computational cost that, at least for the systems studied here, seems not worthwhile. In other words, the use of MULL(solv) hybrid partial charges seems to be a good compromise between correctness and computational cost. Each of the indirect NEW switching protocols requires the calculation of

Δ A^{MM \leftrightarrow MULL}

; however, this is a strict force field-based free energy calculation, which can be computed fast and efficiently on GPUs using the CHARMM/OpenMM interface (cf. Methods).

3.3. Performance of modified switching protocols

Figure 6 summarizes the results obtained with stepwise linear switching protocols for a subset of the model compounds. In all cases, the baseline results, indicated as red, solid circles, are the free energy difference obtained from one-step MM → SQM switches, i.e., the MM/JAR(2ps) results already presented. For the first problematic case, 5-t2, all three stepwise linear protocols reduce

δ Δ A

considerably. However, in the case of 6-t2, it is difficult to speak of improvements. Two protocols, L2-1 and L3-1 lower

δ Δ A

, but the values are still far outside the

\pm 0.25

kcal/mol threshold. The third protocol, L3-2, which worked extremely well for 5-t2 actually increases

δ Δ A

for 6-t2 compared to L1, the default linear protocol. The other three compounds included in the subset were chosen as negative controls because they performed already well with the JAR(2ps) protocol. The performance of the stepwise linear protocol is comparable, except for one poor result for 4-t2 when using L3-1.

Turning to the combination of stepwise linear and indirect NEW switching protocols with the MULL(solv) hybrid intermediate charge state (shown as squares), most of the results lie within the

\pm 0.25

kcal/mol threshold. However, since in this case, the performance of the linear protocol was excellent to begin with, any improvements are marginal. Moreover, for 4-t2 the stepwise linear protocols perform slightly poorer, and for 6-t2

δ Δ A

obtained with the L3-1 is

> 1

kcal/mol.

Based on these findings, the performance of stepwise linear switching protocols seems mixed at best. None of the protocols tested consistently improved the results. Furthermore, even when the results were improved and fell within the

\pm 0.25

kcal/mol threshold, the standard deviation of the work values,

σ_{W}

, was always larger than for the linear switching protocol. Since

σ_{W}

is a sensitive indicator of whether one can expect convergence of JAR calculations [32], the utility of protocols that increase it seems doubtful.

3.4. A detailed analysis of the factors affecting convergence

3.4.1. Effects of charge distribution on solute properties

To quantify the differences between the various charge models used in this study, we calculated the root-mean-square deviation of the atomic charge differences

{RMSD}_{q}

between the MM, MULL(gas), MULL(solv) representations, using the MULL(solv*) charges as the reference value (cf. Eq. 8). Since the latter are the average of the Mulliken charges derived from simulations at the SQM/MM level of theory, they are closest to the fluctuating charge distribution at the target high level of theory. The variability of the Mulliken charges derived from the SQM(/MM) reevaluations about their mean values were always extremely low, fluctuating typically less than

\pm 0.03

e about the mean; the largest value observed was 0.08 e. These low standard deviations make the use of average values meaningful in the first place. In addition to

{RMSD}_{q}

, we looked at the magnitude of the differential dipole moment

|Δ \vec{μ}| = Δ μ

(Eq. 9) and the angle

Θ_{SQM}

between the dipole moment of a charge distribution and that of the MULL(solv*) reference charges (Eq. 10).

The individual results for each compound are listed in Table S12 and depicted in Figures S4–S6 of SI. Some overall trends can be seen from the MAD values of

{RMSD}_{q}

,

Δ μ

, and

Θ_{SQM}

in Table 2. As expected, the MAD(

{RMSD}_{q}

) is largest for MM and smallest for MULL(solv), with MULL(gas) in between; the same is the case for MAD(

Δ μ

). This is in accord with the MAD(

δ Δ A

) results of Table 1 in Performance of hybrid charge intermediates, where the use of the MULL(gas) intermediate charge state led to a noticeable improvement, and the MULL(solv) charges lowered the MAD(

δ Δ A

) even further. The MAD(

Θ_{SQM}

) behaves somewhat differently; here the MULL(gas) value is lower than that of MULL(solv). As can be seen in Table S12 and Figure S6, this holds true for most compounds. Nevertheless,

Θ_{SQM}

of MULL(solv) is always smaller than that of the MM charges. The findings strongly suggest that the differences between charge distribution at the two levels of theory have a strong influence on the convergence of calculations of

Δ A^{high \to low}

by JAR, with the difference in magnitude of the two dipole moments being more important than the orientation of the respective dipole moment vectors.

Based on the overall convergence results (Figure 4) we surmised that compounds containing a lactam moiety tended to converge slower than the corresponding lactim state, e.g., 5-t2 vs. 5-t1, etc. One reason for this is that the differences between the MM and the SQM(/MM) descriptions are greater for the lactams than for the lactims. The analysis of partial charges points in this direction as well. Several differences can be discerned in the detailed charge distribution data for the individual molecules, e.g., Figure S4. For all lactim–lactam pairs (compounds 2, 3, 4, 5, 6, 8),the MM

{RMSD}_{q}

of the lactam state is noticeably higher than that of the corresponding lactim state. Similarly, the MM

{RMSD}_{q}

of each of the lactams (red circle) is at least 0.1e higher than for MULL(solv) (blue squares).

In Figure 7 we visualize some points made above for the two molecules, for which obtaining a converged

Δ A^{MM \to SQM}

was most difficult, 5-t2 (top) and 6-t2 (bottom). For each of the three charge representations, MM, MULL(gas), and MULL(solv), we display the differential atomic charges, both as labels, as well as a color gradient from blue to red, and the resulting differential dipole moment vectors (orange arrows). The magnitude of the differential dipole moment

Δ μ = |Δ \vec{μ}|

and the angle

Θ_{SQM}

between the dipole moment of a charge distribution and that of the MULL(solv*) reference charge distribution are listed directly. The difference in partial charges and the MULL(solv*) charges can be large, e.g., the difference for the -C=O part of the lactam moiety is almost

\pm 0.5

e for 5-t2 (top, left in Figure 7). One further sees that the charge differences become smaller between the MM and the two MULL charge sets, atoms colored in clear blue or red for MM, e.g., the -C=O group (left side) have just a shade of blue or red for MULL(solv) (right side). Accordingly, the length of the orange arrows, i.e.,

Δ μ

of the MULL charge states, is noticeably smaller than that for MM. Even more detailed plots, including the exact values of the charge difference for each atom, for 3-t2, 4-t2, 5-t2, 6-t2, and 8-t2 can be found in Figures S7–S11.

3.4.2. Effects of charge distribution on the first solvation shell

Figure 8 displays the difference in the average number of waters

Δ N_{W a t e r s}

within

\leq 3

Å of the solute. The numbers are relative to the target high-level state, i.e., the average number of waters found in the SQM/MM simulations. First, one sees that

Δ N_{W a t e r s}

for the MULL(solv*) charges (green diamonds), which were also derived from the SQM/MM simulations, are very close to zero, in accord with expectation. Several results for the MULL(solv) charges (blue squares) are also quite close to zero, but mostly for the lactams (particularly 2-t2, 3-t2, 4-t2, 5-t2) the difference in waters

Δ N_{W a t e r s} \approx - 1

. For the MULL(gas) charges (orange triangles), all

Δ N_{W a t e r s}

values are negative, i.e., on average, there are fewer waters in close contact with the solute compared to SQM/MM. This is not too surprising because the charges were derived from SCC-DFTB gas phase calculations; hence, the solute did not experience polarization from interaction with the solvent. This may also explain why on average the use of the MULL(gas) hybrid intermediate state performed worse than MULL(solv), even though significantly better than MM (cf. Table 1).

The MM results (red circles) fall into two distinct groups, with either more or less water present than in SQM/MM. For most lactams, with the slightly surprising exception of 6-t2,

Δ N_{W a t e r s}

is negative, whereas the lactims, i.e., have a positive

Δ N_{W a t e r s}

. Thus, one sees again distinct differences in properties for the force field representations of lactims and lactams, respectively.

3.4.3. Water reorientation dynamics

In the previous subsections, we characterized the influence of different charge distributions (atomic partial charges) on static properties of the solute and the surrounding solvent layer in several ways. As the endpoints of the NEW switches have different charge distributions, it is of considerable interest to study the dynamics of solvent reorientation. We, therefore, computed the un-normalized Stoke shift relaxation

\bar{Δ U (t)}

(Eq. 12) for several compounds and the MM and MULL(solv) charge distributions. The results for 5t-2 are shown in Figure 9; analogous plots for 4t-2, 6t-2, and 8t-2 can be found in Figures S12–S14 in SI. All fit parameters are summarized in Table S13 of SI.

As described in Methods, the mono-exponential fit was carried out in such a way that it picks out the slow process(es) of the water reorientation. Looking at Figure 9, one sees that the relaxation times for MM and MULL(solv) are relatively comparable (

τ \approx 1

ps), but that the prefactor

Δ U_{0}

is quite different (

> 10

kcal/mol for MM and

\approx 4

kcal/mol for MULL(solv)). Thus, while

\bar{Δ U (t)}

has for all practical reasons reached 0 after about 3 ps in the MULL(solv) case (blue curve), for MM (red curve) this is the case only after 5 ps. Since we would ideally carry out NEW switches of only 2 ps length, the obvious question is the value of

Δ U (t = 2 ps)

. Using the fit parameters, we find for MM ≈ 1.7 kcal/mol, and for MULL(solv) ≈ 0.5 kcal/mol. These values roughly mirror the values for

δ Δ A

for MM/JAR(2ps) and MULL(solv), respectively. For the second problem case, 6-t2, very similar results were obtained; see Figure S13 and Table S13.

Overall, the slow decay process of

\bar{Δ U (t)}

always occurs with a time constant

τ \approx 1

ps. Thus, the degree to which the system is still out of equilibrium regarding the SQM/MM target state, therefore, depends crucially on how big the difference is at

t = 0

. Thus, it becomes clear that convergence is facilitated if the solute charge distributions of the initial and final state resemble each other, explaining why better results were obtained for all the MULL intermediate charge states. Phrased differently, to obtain converged JAR results from 2 ps switching simulations, the charge distributions of initial and final state have to be very similar. Since the determining factor seems to be the initial difference in charge distribution, using stepwise linear switching paths cannot help much, explaining at least in part why we did not obtain any real improvements from the stepwise linear protocols we attempted.

3.4.4. Interplay between charge distribution and conformational preferences

The compounds used in this work were mostly rigid and specifically chosen to avoid complications from conformational degrees of freedom, exceptions being 1-t1 and the blocked amino acids Ala and Ser. Modifying partial charges as for the MULL hybrid intermediate states may have an effect on conformational preferences. As can be seen in Figure 5, all three MULL hybrid intermediate states improve convergence for 1-t1. For the two blocked amino acids, MULL(gas) results in poorer convergence for Ala, and MULL(solv) in slightly poorer convergence for Ser. Given that Ala and Ser are the smallest possible peptide-like molecules with protein backbone-like

ϕ

and

ψ

dihedral angles, we did some analysis on dihedral angle conformational preferences. In earlier work [31] we showed that purely classical Hamiltonians and SQM/MM Hamiltonians resulted in different preferred conformations of

ϕ

and

ψ

, as well as

χ_{1}

in the Ser case. In Figure 10 and Figure 11

ϕ / ψ

maps for Ala and Ser are shown for MM, MULL(solv) and SQM. The differences between MM (left) and SQM (right) are clearly visible. For both blocked amino acids, MM has a single narrow minimum at

ϕ \approx 150^{\circ}

,

ψ \approx - 50^{\circ}

, whereas SQM has a broader distribution at

ϕ \approx 150^{\circ}

, and

- 150^{\circ} < ψ < - 50^{\circ}

. While the MULL

ϕ / ψ

maps (middle plots) are more similar to MM than to SQM, a second minimum at

ϕ \approx 150^{\circ} / ψ \approx - 150^{\circ}

has appeared. Thus, although the effect is small, the use of hybrid charge intermediates also makes this state slightly more high-level like in terms of conformational preferences.

4. Discussion

The optimized protocol proposed in Ref. [36] (200 switches of 2 ps length) would be efficient enough for many practical applications. However, it has only been validated in the gas phase. If water reorientation is responsible for the slower convergence in aqueous solution as observed previously [31,37], then, given the experimentally known slower relaxation time constant of

> 1

ps, switching lengths of 2 ps may not be long enough. This suspicion was confirmed in this work, and we found that in one case even a switching length of 5 ps was insufficient.

We explored two approaches to keep the switching length at 2ps: (i) introducing a hybrid intermediate charge state, which makes the solute charge distribution more similar to that at the SQM/MM level of theory. (ii) Since the solvent reorientation also has an ultrafast component, we used a stepwise linear switching scheme instead of the standard one, switching faster initially followed by one or two slower steps. While the use of hybrid intermediate charge states significantly improved convergence, the benefits of multi-stage switching protocols were mixed.

The partial charges of the intermediate hybrid states were obtained by averaging over the Mulliken charges obtained from 200 energy evaluations at the SQM level of theory. The computational cost of this is relatively small; the overhead, if any, comes from sampling the configurations over which one averages in the first place. Not surprisingly, the best results (lowest overall

δ Δ A

; see Table 1) were obtained from coordinate sets generated during SQM/MM simulations, i.e., calculations at the target high level of theory (MULL(solv*)). While we have performed such calculations to reference results with CRO, we would ideally like to avoid them in practical applications. In contrast, the cheapest approach is to use the coordinate sets saved during the MM gas phase simulations (MULL(gas)). While these results were an improvement over the direct MM→SQM/MM switches, the MAD(

δ Δ A

) was higher than that of MULL(solv*). The results for the third method tested, MULL(solv), fell in between. Here, the configurations to be re-evaluated at the SQM/MM level of theory were generated from MM simulations in the solvated phase. Since one can reuse the stored coordinates/restart files to start the NEW switching simulations, the computational cost is practically zero. All results obtained with the MULL(solv) hybrid intermediate charge state met our rather stringent quality criteria; the improvements over the direct MM→SQM/MM switches were noticeable, and the difference in convergence over the more expensive MULL(solv*) approach was small (MAD of 0.09 instead of 0.07 kcal/mol). Thus, the MULL(solv) approach to derive an intermediate charge representation seems to be a good compromise between computational cost and convergence improvement. All methods require an additional, free energy simulation to obtain

Δ A^{MM \leftrightarrow MULL}

at the low level of theory, but its computational cost is small compared to that of the NEW switching simulations needing SQM/MM Hamiltonians.

As reported in Performance of modified switching protocols, we have not seen consistent convergence improvements for all systems using the two-stage and three-stage switching protocols. This prompted a detailed analysis of how the MM and SQM/MM representations differ. Comparing the MM charges of the force field to the average Mulliken charges, one sees that there are indeed surprisingly large differences in charge distribution and hence the solute dipole moment. The differences in properties of the solute have a noticeable effect on the number of water molecules in close contact with the solute. Furthermore, by studying the time correlation function

\bar{Δ U (t)}

after switching from one level of theory to the other, we were able to confirm the time scales of experimental solvation spectroscopy. There clearly is a fast process with a relaxation time

≪ 1

ps, and a slower process with

τ \approx 1

ps. This slower time constant does not change much if the initial charge distributions are equal to MM or MULL(solv). If the difference between the initial and final distribution is large, then the effect of the unfinished solvent reorientation after 2 ps (the switching length we are aiming for), will also be large. If the two charge distributions are more similar, as is the case for MULL(solv), then

Δ U (t = 2 ps)

is small enough to have little effect on the results, even though the solvent reorientation may not be complete. On the one hand, this observation explains the effectiveness of using hybrid charge intermediate states. On the other hand, it helps to rationalize our failure to obtain consistently convergent results with the multistage switching schemes. While one could certainly construct more complex switching paths, the speed of the solvent reorientation would remain the same.

5. Conclusions

In summary, the reliable calculation of free energy differences between MM and SQM/MM levels of theory in aqueous solution requires either switching lengths of at least 5 ps or the use of an appropriate intermediate charge state. In terms of aiding convergence, the use of intermediate states is not new. For example, the internal degrees of freedom of a molecule or region that should be described at two levels of theory can be made more like the target high-level representation by force matching [29,33,73]. Thus, the use of intermediate charge states plays the same role for electrostatic interactions between the core region (the region changed from e.g., MM to SQM) and the environment (the region always described at the low level of theory) as force-matched parameters play for the bonded energy terms of the core region. Since intermediate charge states can be rationally designed without too much computational effort, the poor convergence of MM→SQM/MM simulations due to slow solvent reorientation can be easily circumvented.

Supplementary Materials

The following supporting information can be downloaded at the website of this paper posted on Preprints.org. Figure S1:

δ Δ A

including error bars; Figure S2:

δ Δ A

all points; Figure S3:

δ Δ A

all points including error bars; Figure S4: RMSD[q]; Figure S5:

Δ μ

[D]; Figure S6:

θ_{S Q M} [^{\circ}]

; Figure S7: 3-t2 Charge analysis; Figure S8: 4-t2 Charge analysis; Figure S9: 8-t2 Charge analysis; Figure S10: 4-t2 TIP3P Relaxation; Figure S11: 6-t2 TIP3P Relaxation; Figure S12: 8-t2 TIP3P Relaxation; Table S1: Additional information on model compounds; Table S2: Summary table of all

δ Δ A

results in solution; Table S3: Detailed results for

N_{s w i t c h}

=2000 fs

N_{r e p l i c a t e}

=200

Δ A_{s o l v}^{M M \to S Q M}

in solution; Table S4: Detailed results for

N_{s w i t c h}

=2000 fs

N_{r e p l i c a t e}

=200

Δ A_{g a s}^{M M \to S Q M}

; Table S5: Detailed results for

N_{s w i t c h}

=5000 fs

N_{r e p l i c a t e}

=200

Δ A_{s o l v}^{M M \to S Q M}

in solution; Table S6: Detailed results for

N_{s w i t c h}

=2000 fs

N_{r e p l i c a t e}

=200

Δ A_{s o l v}^{M U L L (g a s) \to S Q M}

in solution; Table S7: Detailed results for energy correction term via FEP

Δ A_{s o l v}^{M U L L (g a s) \to S Q M}

in solution; Table S8: Detailed results for

N_{s w i t c h}

=2000 fs

N_{r e p l i c a t e}

=200

Δ A_{s o l v}^{M U L L (s o l v) \to S Q M}

in solution; Table S9: Detailed results for energy correction term via FEP

Δ A_{s o l v}^{M U L L (s o l v) \to S Q M}

in solution; Table S10: Detailed results for

N_{s w i t c h}

=2000 fs

N_{r e p l i c a t e}

=200

Δ A_{s o l v}^{M U L L (s o l v *) \to S Q M}

in solution; Table S11: Detailed results for energy correction term via FEP

Δ A_{s o l v}^{M U L L (s o l v *) \to S Q M}

in solution; Table S12: Mulliken Charge Analysis; Table S13: TIP3P Water Relaxation

Author Contributions

Conceptualization, A.S., H.L.W, and S.B.; methodology, H.L.W. and S.B.; validation, A.S., and S.B.; formal analysis, A.S.; investigation, A.S.; resources, S.B.; writing—original draft preparation, A.S.; writing—review and editing, H.L.W. and S.B.; visualization, A.S.; supervision, S.B.; project administration, S.B.; funding acquisition, H.L.W. and S.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Institute of General Medical Sciences grant number 1R01GM129519-01 (HLW) and by Austrian Science Fund grant number P-31024 (SB).

Institutional Review Board Statement

Not applicable

Informed Consent Statement

Not applicable

Data Availability Statement

Any additional data not available in the main manuscript or the SI are available on request from the corresponding authors.

Acknowledgments

We thank Christian Schröder for helping with the calculations of Stoke shift relaxations.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MD	Molecular Dynamics
MM	Molecular Mechanics
(S)QM	(Semi-empirical-)Quantum Mechanics
(S)QM/MM	(Semi-empirical-)Quantum Mechanical/ Molecular Mechanical hybrid methods
FES	Free energy simulation
FEP	Free energy perturbation
TI	Thermodynamic integration
BAR	Bennett acceptance ratio
NEW	Non-equilibrium work methods
JAR	Jarzynski’s Equation
CRO	Crook’s Equation

References

Bash, P.A.; Singh, U.C.; Langridge, R.; Kollman, P.A. Free Energy Calculations by Computer Simulation. Science 1987, 236, 564–568. [Google Scholar] [CrossRef] [PubMed]
Gao, J.; Kuczera, K.; Tidor, B.; Karplus, M. Hidden Thermodynamics of Mutant Proteins: A Molecular Dynamics Analysis. Science 1989, 244, 1069–1072. [Google Scholar] [CrossRef]
Chodera, J.D.; Mobley, D.L.; Shirts, M.R.; Dixon, R.W.; Branson, K.; Pande, V.S. Alchemical free energy methods for drug discovery: progress and challenges. Curr. Opin. Struct. Biol. 2011, 21, 150–160. [Google Scholar] [CrossRef] [PubMed]
Barros, E.P.; Ries, B.; Böselt, L.; Champion, C.; Riniker, S. Recent developments in multiscale free energy simulations. Curr. Opin. Struct. Biol. 2022, 72, 55–62. [Google Scholar] [CrossRef]
Mey, A.S.; Allen, B.K.; Macdonald, H.E.B.; Chodera, J.D.; Hahn, D.F.; Kuhn, M.; Michel, J.; Mobley, D.L.; Naden, L.N.; Prasad, S.; et al. Best practices for alchemical free energy calculations [Article v1.0]. Living journal of computational molecular science 2020, 2. [Google Scholar] [CrossRef]
Schindler, C.E.; Baumann, H.; Blum, A.; Boöse, D.; Buchstaller, H.P.; Burgdorf, L.; Cappel, D.; Chekler, E.; Czodrowski, P.; Dorsch, D.; et al. Large-scale assessment of binding free energy calculations in active drug discovery projects. J. Chem. Inf. Model. 2020, 60, 5457–5474. [Google Scholar] [CrossRef]
Cournia, Z.; Allen, B.; Sherman, W. Relative Binding Free Energy Calculations in Drug Discovery: Recent Advances and Practical Considerations. J. Chem. Inf. Model. 2017, 57, 2911–2937. [Google Scholar] [CrossRef] [PubMed]
Limongelli, V. Ligand binding free energy and kinetics calculation in 2020. Wiley Interdisciplinary Reviews: Computational Molecular Science 2020, 10, e1455. [Google Scholar] [CrossRef]
Maier, S.; Thapa, B.; Erickson, J.; Raghavachari, K. Comparative assessment of QM-based and MM-based models for prediction of protein–ligand binding affinity trends. Phys. Chem. Chem. Phys. 2022. [Google Scholar] [CrossRef]
Yang, W.; Gao, Y.Q.; Cui, Q.; Ma, J.; Karplus, M. The missing link between thermodynamics and structure in F₁-ATPase. Proc. Natl. Acad. Sci. 2003, 100, 874–879. [Google Scholar] [CrossRef]
Mulholland, A.J. Chemical accuracy in QM/MM calculations on enzyme-catalysed reactions. Chem. Cent. J. 2007, 1, 1–5. [Google Scholar] [CrossRef]
Lev, B.; Roux, B.; Noskov, S.Y. Relative free energies for hydration of monovalent ions from QM and QM/MM simulations. J. Chem. Theory Comput. 2013, 9, 4165–4175. [Google Scholar] [CrossRef] [PubMed]
Riahi, S.; Roux, B.; Rowley, C.N. QM/MM molecular dynamics simulations of the hydration of Mg (II) and Zn (II) ions. Can. J. Chem. 2013, 91, 552–558. [Google Scholar] [CrossRef]
Delgado, J.A.; Wineman-Fisher, V.; Pandit, S.; Varma, S. Inclusion of High-Field Target Data in AMOEBA’s Calibration Improves Predictions of Protein–Ion Interactions. J. Chem. Inf. Model. 2022, 62, 4713–4726. [Google Scholar] [CrossRef] [PubMed]
Yang, W.; Cui, Q.; Min, D.; Li, H. QM/MM Alchemical Free Energy Simulations: Challenges and Recent Developments. Annu. Rep. Comput. Chem. 2010, 6, 51–62. [Google Scholar] [CrossRef]
Gao, J. Absolute free energy of solvation from Monte Carlo simulations using combined quantum and molecular mechanical potentials. J. Phys. Chem. 1992, 96, 537–540. [Google Scholar] [CrossRef]
Luzhkov, V.; Warshel, A. Microscopic models for quantum mechanical calculations of chemical processes in solutions: LD/AMPAC and SCAAS/AMPAC calculations of solvation energies. J. Comput. Chem. 1992, 13, 199–213. [Google Scholar] [CrossRef]
Cui, Q.; Pal, T.; Xie, L. Biomolecular QM/MM Simulations: What Are Some of the “Burning Issues”? J. Phys. Chem. B 2021, 125, 689–702. [Google Scholar] [CrossRef] [PubMed]
Hudson, P.S.; Aviat, F.; Meana-Pañeda, R.;Warrensford, L.; Pollard, B.C.; Prasad, S.; Jones, M.R.;Woodcock, H.L.; Brooks, B.R. 813 Obtaining QM/MM binding free energies in the SAMPL8 drugs of abuse challenge: indirect approaches. Obtaining QM/MM binding free energies in the SAMPL8 drugs of abuse challenge: indirect approaches. J. Comput.-Aided Mol. Des. 2022, pp. 1–15. [CrossRef]
Wang, X.; He, Q.; Sun, Z. BAR-based multi-dimensional nonequilibrium pulling for indirect construction of a QM/MM free energy landscape. Phys. Chem. Chem. Phys. 2019, 21, 6672–6688. [Google Scholar] [CrossRef]
Sun, Z.; Liu, Z. BAR-Based Multi-Dimensional Nonequilibrium Pulling for Indirect Construction of QM/MM Free Energy Landscapes: Varying the QM Region. Advanced Theory and Simulations 2021, 4, 2100185. [Google Scholar] [CrossRef]
Wang, J.N.; Liu, W.; Li, P.; Mo, Y.; Hu, W.; Zheng, J.; Pan, X.; Shao, Y.; Mei, Y. Accelerated Computation of Free Energy Profile at Ab Initio Quantum Mechanical/Molecular Mechanics Accuracy via a Semiempirical Reference Potential. 4. Adaptive QM/MM. J. Chem. Theory Comput. 2021, 17, 1318–1325. [Google Scholar] [CrossRef] [PubMed]
Hudson, P.S.; Han, K.; Woodcock, H.L.; Brooks, B.R. Force matching as a stepping stone to QM/MM CB [8] host/guest binding free energies: a SAMPL6 cautionary tale. J. Comput. -Aided Mol. Des. 2018, 32, 983–999. [Google Scholar] [CrossRef] [PubMed]
König, G.; Pickard, F.C.; Mei, Y.; Brooks, B.R. Predicting hydration free energies with a hybrid QM/MM approach: an evaluation of implicit and explicit solvation models in SAMPL4. J. Comput. -Aided Mol. Des. 2014, 28, 245–257. [Google Scholar] [CrossRef] [PubMed]
König, G.; Brooks, B.R.; Thiel, W.; York, D.M. On the convergence of multi-scale free energy simulations. Mol. Simul. 2018, 44, 1062–1081. [Google Scholar] [CrossRef] [PubMed]
König, G.; Brooks, B.R. Predicting binding affinities of host-guest systems in the SAMPL3 blind challenge: the performance of relative free energy calculations. J. Comput. -Aided Mol. Des. 2012, 26, 543–550. [Google Scholar] [CrossRef]
Zeng, J.; Tao, Y.; Giese, T.J.; York, D.M. Modern semiempirical electronic structure methods and machine learning potentials for drug discovery: Conformers, tautomers, and protonation states. J. Chem. Phys. 2023, 158, 124110. [Google Scholar] [CrossRef] [PubMed]
Heimdal, J.; Ryde, U. Convergence of QM/MM free-energy perturbations based on molecular-mechanics or semiempirical simulations. Phys. Chem. Chem. Phys. 2012, 14, 12592–12604. [Google Scholar] [CrossRef] [PubMed]
Giese, T.J.; York, D.M. Development of a robust indirect approach for MM→QM free energy calculations that combines force-matched reference potential and Bennett’s acceptance ratio methods. J. Chem. Theory Comput. 2019, 15, 5543–5562. [Google Scholar] [CrossRef] [PubMed]
Hudson, P.S.; Woodcock, H.L.; Boresch, S. Use of Nonequilibrium Work Methods to Compute Free Energy Differences Between Molecular Mechanical and Quantum Mechanical Representations of Molecular Systems. J. Phys. Chem. Lett. 2015, 6, 4850–4856. [Google Scholar] [CrossRef]
Kearns, F.L.; Hudson, P.S.; Woodcock, H.L.; Boresch, S. Computing Converged Free Energy Differences Between Levels of Theory via Nonequilibrium Work Methods: Challenges and Opportunities. J. Comput. Chem. 2017, 38, 1376–1388. [Google Scholar] [CrossRef]
Boresch, S.; Woodcock, H.L. Convergence of single-step free energy perturbation. Mol. Phys. 2017, 115, 1200–1213. [Google Scholar] [CrossRef]
Hudson, P.S.; Boresch, S.; Rogers, D.M.; Woodcock, H.L. Accelerating QM/MM Free Energy Computations via Intramolecular Force Matching. J. Chem. Theory Comput. 2018, 14, 6327–6335. [Google Scholar] [CrossRef] [PubMed]
Kearns, F.L.; Warrensford, L.; Boresch, S.; Woodcock, H.L. The Good, the Bad, and the Ugly: “HiPen”, a New Dataset for Validating (S)QM/MM Free Energy Simulations. Molecules 2019, 24. [Google Scholar] [CrossRef] [PubMed]
Hudson, P.S.; Woodcock, H.L.; Boresch, S. Use of Interaction Energies in QM/MM Free Energy Simulations. J. Chem. Theory Comput. 2019, 15, 4632–4645. [Google Scholar] [CrossRef] [PubMed]
Schöller, A.; Kearns, F.; Woodcock, H.L.; Boresch, S. Optimizing the Calculation of Free Energy Differences in Nonequilibrium Work SQM/MM Switching Simulations. J. Phys. Chem. B 2022, 126, 2798–2811. [Google Scholar] [CrossRef] [PubMed]
Ito, S.; Cui, Q. Multi-level free energy simulation with a staged transformation approach. J. Chem. Phys. 2020, 153, 044115. [Google Scholar] [CrossRef]
Maroncelli, M.; Fleming, G.R. Computer simulation of the dynamics of aqueous solvation. J. Chem. Phys. 1988, 89, 5044–5069. [Google Scholar] [CrossRef]
Maroncelli, M. The dynamics of solvation in polar liquids. J. Mol. Liq. 1993, 57, 1–37. [Google Scholar] [CrossRef]
Jarzynski, C. Nonequilibrium equality for free energy differences. Phys. Rev. Lett. 1997, 78, 2690. [Google Scholar] [CrossRef]
Dellago, C.; Hummer, G. Computing Equilibrium Free Energies Using Non-Equilibrium Molecular Dynamics. Entropy 2014, 16, 41–61. [Google Scholar] [CrossRef]
Pohorille, A.; Jarzynski, C.; Chipot, C. Good Practices in Free-Energy Calculations. J. Phys. Chem. B 2010, 114, 10235–10253. [Google Scholar] [CrossRef] [PubMed]
Crooks, G.E. Path-ensemble averages in systems driven far from equilibrium. Phys. Rev. E 2000, 61, 2361. [Google Scholar] [CrossRef]
Woodcock, H.L.; Miller, B.T.; Hodoscek, M.; Okur, A.; Larkin, J.D.; Ponder, J.W.; Brooks, B.R. MSCALE: a general utility for multiscale modeling. J. Chem. Theory Comput. 2011, 7, 1208–1219. [Google Scholar] [CrossRef] [PubMed]
Brooks, B.R.; Brooks III, C.L.; Mackerell Jr, A.D.; Nilsson, L.; Petrella, R.J.; Roux, B.; Won, Y.; Archontis, G.; Bartels, C.; Boresch, S.; et al. CHARMM: the biomolecular simulation program. J. Comput. Chem. 2009, 30, 1545–1614. [Google Scholar] [CrossRef]
Hunter, J.E.; Reinhardt, W.P.; Davis, T.F. A finite-time variational method for determining optimal paths and obtaining bounds on free energy changes from computer simulations. J. Chem. Phys. 1993, 99, 6856–6864. [Google Scholar] [CrossRef]
Geballe, M.T.; Skillman, A.G.; Nicholls, A.; Guthrie, J.P.; Taylor, P.J. The SAMPL2 Blind Prediction Challenge: Introduction and Overview. J. Comput.-Aided Mol. Des. 2010, 24, 259–279. [Google Scholar] [CrossRef] [PubMed]
Wahl, O.; Sander, T. Tautobase: an open tautomer database. J. Chem. Inf. Model. 2020, 60, 1085–1089. [Google Scholar] [CrossRef] [PubMed]
Wieder, M.; Fass, J.; Chodera, J.D. Fitting quantum machine learning potentials to experimental free energy data: predicting tautomer ratios in solution. Chem. Sci. 2021, 12, 11364–11381. [Google Scholar] [CrossRef]
Elstner, M.; Porezag, D.; Jungnickel, G.; Elsner, J.; Haugk, M.; Frauenheim, T.; Suhai, S.; Seifert, G. Self-consistent-charge density-functional tight-binding method for simulations of complex materials properties. Phys. Rev. B 1998, 58, 7260. [Google Scholar] [CrossRef]
Elstner, M. The SCC-DFTB method and its application to biological systems. Theor. Chem. Acc. 2006, 116, 316–325. [Google Scholar] [CrossRef]
Bennett, C.H. Efficient estimation of free energy differences from Monte Carlo data. J. Comput. Phys. 1976, 22, 245–268. [Google Scholar] [CrossRef]
Pearlman, D.A.; Kollman, P.A. The lag between the Hamiltonian and the system configuration in free energy perturbation calculations. J. Chem. Phys. 1989, 91, 7831–7839. [Google Scholar] [CrossRef]
Hermans, J. Simple analysis of noise and hysteresis in (slow-growth) free energy simulations. J. Phys. Chem. 1991, 95, 9029–9032. [Google Scholar] [CrossRef]
Jarzynski, C. Rare events and the convergence of exponentially averaged work values. Phys. Rev. E 2006, 73, 046105. [Google Scholar] [CrossRef] [PubMed]
Heid, E.; Harringer, S.; Schröder, C. The small impact of various partial charge distributions in ground and excited state on the computational Stokes shift of 1-methyl-6-oxyquinolinium betaine in diverse water models. J. Chem. Phys. 2016, 145, 164506. [Google Scholar] [CrossRef] [PubMed]
Schröder, C.; Heid, E. Computational solvation dynamics: Implementation, application, and validation. In Annual Reports in Computational Chemistry; Elsevier, 2020; pp. 93–154. [CrossRef]
Jo, S.; Kim, T.; Iyer, V.G.; Im, W. CHARMM-GUI: a web-based graphical user interface for CHARMM. J. Comput. Chem. 2008, 29, 1859–1865. [Google Scholar] [CrossRef] [PubMed]
Lee, J.; Cheng, X.; Swails, J.M.; Yeom, M.S.; Eastman, P.K.; Lemkul, J.A.; Wei, S.; Buckner, J.; Jeong, J.C.; Qi, Y.; et al. CHARMM-GUI input generator for NAMD, GROMACS, AMBER, OpenMM, and CHARMM/OpenMM simulations using the CHARMM36 additive force field. J. Chem. Theory Comput. 2016, 12, 405–413. [Google Scholar] [CrossRef] [PubMed]
Vanommeslaeghe, K.; MacKerell Jr, A.D. Automation of the CHARMM General Force Field (CGenFF) I: bond perception and atom typing. J. Chem. Inf. Model. 2012, 52, 3144–3154. [Google Scholar] [CrossRef] [PubMed]
Vanommeslaeghe, K.; Raman, E.P.; MacKerell Jr, A.D. Automation of the CHARMM General Force Field (CGenFF) II: Assignment of Bonded Parameters and Partial Atomic Charges. J. Chem. Inf. Model. 2012, 52, 3155–3168. [Google Scholar] [CrossRef]
Yu, W.; He, X.; Vanommeslaeghe, K.; MacKerell Jr, A.D. Extension of the CHARMM general force field to sulfonyl-containing compounds and its utility in biomolecular simulations. J. Comput. Chem. 2012, 33, 2451–2468. [Google Scholar] [CrossRef]
Neria, E.; Fischer, S.; Karplus, M. Simulation of activation free energies in molecular systems. J. Chem. Phys. 1996, 105, 1902. [Google Scholar] [CrossRef]
Feller, S.E.; Zhang, Y.; Pastor, R.W.; Brooks, B.R. Constant pressure molecular dynamics simulation: The Langevin piston method. J. Chem. Phys. 1995, 103, 4613–4621. [Google Scholar] [CrossRef]
Ryckaert, J.P.; Ciccotti, G.; Berendsen, H.J. Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Comput. Phys. 1977, 23, 327–341. [Google Scholar] [CrossRef]
Essmann, U.; Perera, L.; Berkowitz, M.L.; Darden, T.; Lee, H.; Pedersen, L.G. A smooth particle mesh Ewald method. J. Chem. Phys. 1995, 103, 8577–8593. [Google Scholar] [CrossRef]
Cui, Q.; Elstner, M.; Kaxiras, E.; Frauenheim, T.; Karplus, M. A QM/MM implementation of the self-consistent charge density functional tight binding (SCC-DFTB) method. J. Phys. Chem. B 2001, 105, 569–585. [Google Scholar] [CrossRef]
Gaus, M.; Goez, A.; Elstner, M. Parametrization and Benchmark of DFTB3 for Organic Molecules. J. Chem. Theory Comput. 2013, 9, 338–354. [Google Scholar] [CrossRef] [PubMed]
Gaus, M.; Lu, X.; Elstner, M.; Cui, Q. Parameterization of DFTB3/3OB for Sulfur and Phosphorus for Chemical and Biological Applications. J. Chem. Theory Comput. 2014, 10, 1518–1537. [Google Scholar] [CrossRef] [PubMed]
Lu, X.; Gaus, M.; Elstner, M.; Cui, Q. Parametrization of DFTB3/3OB for Magnesium and Zinc for Chemical and Biological Applications. J. Phys. Chem. B 2015, 119, 1062–1082. [Google Scholar] [CrossRef] [PubMed]
Kubillus, M.; Kubař, T.; Gaus, M.; Řezáč, J.; Elstner, M. Parameterization of the DFTB3 Method for Br, Ca, Cl, F, I, K, and Na in Organic and Biological Systems. J. Chem. Theory Comput. 2015, 11, 332–342. [Google Scholar] [CrossRef]
Eastman, P.; Swails, J.; Chodera, J.D.; McGibbon, R.T.; Zhao, Y.; Beauchamp, K.A.; Wang, L.P.; Simmonett, A.C.; Harrigan, M.P.; Stern, C.D.; et al. OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLoS Comput. Biol. 2017, 13, e1005659. [Google Scholar] [CrossRef]
Pan, X.; Li, P.; Ho, J.; Pu, J.; Mei, Y.; Shao, Y. Accelerated computation of free energy profile at ab initio quantum mechanical/molecular mechanical accuracy via a semi-empirical reference potential. II. Recalibrating semi-empirical parameters with force matching. Phys. Chem. Chem. Phys. 2019, 21, 20595–20605. [Google Scholar] [CrossRef] [PubMed]

1	See https://academiccharmm.org/documentation/version/c47b1/corman
2	See https://academiccharmm.org/documentation/version/c47b1/select
3	https://www.dftb.org/parameters/ download/3ob/3ob-3-1-cc/
4	https://academiccharmm.org/documentation/version/c47b1/openmm

Figure 1. a) Indirect alchemical thermodynamic cycle to compute

Δ A_{X g a s \to s o l v}^{S Q M / M M}

(dashed arrow) in three steps according to Eq. 1. b) Indirect alchemical thermodynamic cycle to compute

Δ A_{X s o l v}^{M M \to S Q M / M M}

(red arrow) via hybrid charge intermediates; cf. Eq. 7.

Figure 1. a) Indirect alchemical thermodynamic cycle to compute

Δ A_{X g a s \to s o l v}^{S Q M / M M}

(dashed arrow) in three steps according to Eq. 1. b) Indirect alchemical thermodynamic cycle to compute

Δ A_{X s o l v}^{M M \to S Q M / M M}

(red arrow) via hybrid charge intermediates; cf. Eq. 7.

Figure 2. Model systems used in this work: 8 tautomer pairs, 7 taken from the SAMPL2 challenge [47], one (7-t1 ⇌ 7-t2) from Tautobase [48]. In addition, we used the blocked amino acids Ala and Ser as model solutes.

Figure 3. The standard linear (L1) and three modified switching protocols investigated. The total switching length

τ

was 2 ps in all four cases. The color code used for L2-1, L3-1 and L3-2 is also used in Results.

Figure 3. The standard linear (L1) and three modified switching protocols investigated. The total switching length

τ

was 2 ps in all four cases. The color code used for L2-1, L3-1 and L3-2 is also used in Results.

Figure 4. Comparison of

δ Δ A_{X s o l v}^{M M \to S Q M}

using three protocols without intermediate state: JAR(2ps): 2 ps forward switches from MM to SQM, JAR(5ps): 5 ps forward switches from MM to SQM, and CRO(2ps): 2 ps forward and backward switches. In all cases, the CRO results obtained from 5 ps switches were considered as the reference values

{CRO}_{Ref}

. For 5t-2 and 6t-2 the JAR(2ps) results are off-scale, which is indicated by the red arrows.

Figure 4. Comparison of

δ Δ A_{X s o l v}^{M M \to S Q M}

using three protocols without intermediate state: JAR(2ps): 2 ps forward switches from MM to SQM, JAR(5ps): 5 ps forward switches from MM to SQM, and CRO(2ps): 2 ps forward and backward switches. In all cases, the CRO results obtained from 5 ps switches were considered as the reference values

{CRO}_{Ref}

. For 5t-2 and 6t-2 the JAR(2ps) results are off-scale, which is indicated by the red arrows.

Figure 5. Comparison of

δ Δ A

obtained with three hybrid intermediate charge states: MULL(gas), orange triangles; MULL(solv), blue squares; MULL(solv*), green diamonds. All results were obtained from 200 NEW switches of 2 ps length to compute

Δ A^{MULL \to SQM}

and include the correction

Δ A^{MM \leftrightarrow MULL}

; cf. Eq. 14b. The JAR(2ps) results already shown in Figure 4 (red circles) are included for comparison purposes.

Figure 5. Comparison of

δ Δ A

obtained with three hybrid intermediate charge states: MULL(gas), orange triangles; MULL(solv), blue squares; MULL(solv*), green diamonds. All results were obtained from 200 NEW switches of 2 ps length to compute

Δ A^{MULL \to SQM}

and include the correction

Δ A^{MM \leftrightarrow MULL}

; cf. Eq. 14b. The JAR(2ps) results already shown in Figure 4 (red circles) are included for comparison purposes.

Figure 6. Performance of the three modified switching protocols for a subset of the compounds. Results for direct NEW switching protocols MM → SQM are shown as red circles, results for calculations using the MULL(solv) hybrid charge intermediate state as blue squares. The fill color of the circles/squares indicates the stepwise linear switching protocol used (green = L2-1, orange = L3-1 and salmon = L3-2). For comparison purposes, the MM/JAR(2ps) (red circles) and MULL(solv) results (blue squares), both using the default switching protocol L1, are included as well.

Figure 7. Graphical representation of differences in partial atomic charges compared to MULL(solv*) for 5-t2 (top) and 6-t2 (bottom). Charge differences are indicated as a color gradient from blue (

δ q = - 0.2 e

) to red (

δ q = + 0.4 e

). The differential dipole moment

Δ \vec{μ}

is displayed as an orange arrow. The magnitude

Δ μ = |Δ \vec{μ}|

and the angle

Θ_{SQM}

between the dipole moment of the respective charge distribution with that of the MULL(solv*) reference distribution are given below each structure.

Figure 7. Graphical representation of differences in partial atomic charges compared to MULL(solv*) for 5-t2 (top) and 6-t2 (bottom). Charge differences are indicated as a color gradient from blue (

δ q = - 0.2 e

) to red (

δ q = + 0.4 e

). The differential dipole moment

Δ \vec{μ}

is displayed as an orange arrow. The magnitude

Δ μ = |Δ \vec{μ}|

and the angle

Θ_{SQM}

between the dipole moment of the respective charge distribution with that of the MULL(solv*) reference distribution are given below each structure.

Figure 8. Difference in average number of waters closer than 3 Å compared to SQM/MM. The average number of waters in the SQM/MM simulations was used as the reference value.

Figure 9. Water reorientation dynamics of TIP3P solvent for 5-t2 when switching from MM and MULL(solv) to SQM/MM. Raw data are shown in gray; fit for MM in red, fit for MULL(solv) in blue. The fit parameters (

Δ U_{0}

in kcal/mol,

τ

in fs) are listed in the inset; cf. Eq. 13.

Figure 9. Water reorientation dynamics of TIP3P solvent for 5-t2 when switching from MM and MULL(solv) to SQM/MM. Raw data are shown in gray; fit for MM in red, fit for MULL(solv) in blue. The fit parameters (

Δ U_{0}

in kcal/mol,

τ

in fs) are listed in the inset; cf. Eq. 13.

Figure 10. Ramachandran plots of pseudo-dipeptide/blocked alanine.

Figure 11. Ramachandran plots of pseudo-dipeptide/blocked Serine.

Table 1. MAD (mean absolute deviation) for the results shown in Figure 5, as well as the spread of

δ Δ A

(respective smallest and largest absolute deviation).

Table 1. MAD (mean absolute deviation) for the results shown in Figure 5, as well as the spread of

δ Δ A

(respective smallest and largest absolute deviation).

		Spread MAD [kcal/mol]
Pathway	MAD [kcal/mol]	Min	Max
MM	0.29	0.01	1.80
MULL(gas)	0.12	0.01	0.49
MULL(solv)	0.09	0.01	0.28
MULL(solv*)	0.07	0.01	0.20

Table 2. MAD (Mean Absolute Deviation) over all compounds for

{RMSD}_{q}

, the magnitude of the differential dipole moment

Δ μ

, and the angle

Θ_{SQM}

between the dipole moment vector of the respective charge distribution and that of the MULL(solv*) charge distribution.

Table 2. MAD (Mean Absolute Deviation) over all compounds for

{RMSD}_{q}

, the magnitude of the differential dipole moment

Δ μ

, and the angle

Θ_{SQM}

between the dipole moment vector of the respective charge distribution and that of the MULL(solv*) charge distribution.

Pathway	MAD ${RMSD}_{q} [e]$	MAD $Δ μ$ [D]	MAD $θ_{S Q M}$ $[^{\circ}$ ]
MM	0.14	3.38	29.8
MULL(gas)	0.06	2.44	7.6
MULL(solv)	0.04	1.81	16.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Exploring Routes to Enhance the Calculation of Free Energy Differences via Nonequilibrium Work SQM/MM Switching Simulations by Using Hybrid Charge Intermediates Between MM and SQM Level of Theory or Non-linear Switching Schemes

Abstract

Keywords:

Subject:

1. Introduction

2. Materials and Methods

2.1. Computing free energy differences between levels of theory by NEW methods

2.2. Choice of model systems

2.3. Strategies to improve convergence of NEW simulations

2.3.1. Hybrid charge intermediates

2.3.2. Stepwise linear switching protocols

2.3.3. Analyses carried out

Characterizing charge distributions

Characterization of the first solvation shell

Dynamics of solvent reorientation

2.4. Overview of simulations carried out

2.4.1. Simulation details

Preparation and initial equilibration

Force field-based equilibrium simulations

SQM(/MM) equilibrium simulations

2.4.2. Non-equilibrium work simulations

2.4.3. Calculation of Δ A X s o l v MM ↔ MULL :

2.5. Overview of calculated free energy differences and paths

3. Results

3.1. Performance of 2 and 5 ps linear switching protocols in solution

3.2. Performance of hybrid charge intermediates

3.3. Performance of modified switching protocols

3.4. A detailed analysis of the factors affecting convergence

3.4.1. Effects of charge distribution on solute properties

3.4.2. Effects of charge distribution on the first solvation shell

3.4.3. Water reorientation dynamics

3.4.4. Interplay between charge distribution and conformational preferences

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

MDPI Initiatives

Important Links

Subscribe

2.4.3. Calculation of $Δ A_{X s o l v}^{MM \leftrightarrow MULL}$ :