3.1. General Idea of Calculation of the IR Spectra from MD Trajectories
The MD simulation produces an MD trajectory,
simulations points, which span the dynamics in time
t from
to
, where
is the time step of the simulation. Therefore, for various computed physical observables (
, e.g., coordinates, velocities, energy, dipole moment, polarizability, etc.), we can tell their values (
) at a
n-th given point of simulation (
), corresponding to the time
[
18].
As said in the introduction, the vibrational spectra from MD are computed with the help of the FDT [
16,
17]. We first need to compute the autocorrelation function (
) of the variable
that is responsible for the response to the external excitation. For the IR spectra, observable is the total dipole moment of the molecule; for the Raman spectra, it is the molecular polarizability tensor; for the power spectrum, these are simply the mass-weighted coordinates, etc. [
13]. For a real-valued observable, the autocorrelation function is defined as [
4,
5,
13,
40]
and it shows the repeatable periods in the time evolution of the given observable. The numerical calculation of the autocorrelation function for finite time series is available in various packages. For instance, in Python, one could use either the “
correlate” method from the NumPy module [
41] or the “
signal.correlate” method from the SciPy module [
42]. By performing the FT of the
as [
4,
5,
13,
40]
we can transfer the response characteristic from the time domain (
t) to the frequency domain (
), thus obtaining the spectrum of a given process. For numerical reasons, a more advantageous computational strategy is to use the velocity of the observable (
) to obtain the spectrum [
5]. In this case, the expression to be used is[
5]
The resulting spectra from Equations
2 or
3 can be multiplied with the correction factors that account for the statistical properties at the different spectral ranges [
5,
43].
The FT procedure on the MD results (Equation
2 or
3) is usually done using a fast-FT (FFT) approach [
38,
44], which provides certain limitations on the frequency discretization of the resulting spectra. The frequency grid increment (
) is determined by the total trajectory duration
through the Nyquist–Shannon–Kotelnikov (NSK) theorem [
45,
46,
47] as
, while the time step
determines the upper boundary of the spectrum as
. To increase the formal resolution, extending the trajectory for a given observable by an arbitrary number of steps with zeros is possible, which is called zero-padding [
48]. However, such direct extension will cause artifacts in the resulting spectrum; thus, the zero-padding should be combined with either frequency filters or smoothing techniques [
39,
48].
An alternative procedure to the FFT could be the regularized least-squares spectral analysis (rLSSA). The idea is to transform the time-dependent dataset
composed of
N time values
(
) with corresponding observable values
into a frequency domain with
M points
where
(
) is the frequency and the
is the corresponding spectral amplitude value. The rLSSA procedure is derived from the regularized weighted least-squares analysis (rwLSSA) procedure by choosing a unit weights matrix (see Refs. [
39,
49]). As a result, the rLSSA is just a matrix transformation[
39,
49]
where
is the
N-dimensional vector of the time-domain values,
is the
M-dimensional vector of the frequency-domain spectrum amplitude values,
is the
matrix of elements
with
being imaginary unit, and
is the conjugate transpose of matrix
. The matrix
of size
is defined as
with
being a unit matrix of the size
and the regularization parameter
provided by the regularization criterion[
39,
49]
To smooth the data even further, the parameter
can be increased compared to that from Equation
7.
To illustrate that, we can take the vibrational spectrum of the carbon dioxide (
) shown in
Figure 1. A single
-MD trajectory at the GFN2-xTB level of theory was obtained for a temperature of 300 K using the Berendsen thermostat and Maxwell-Boltzmann sampling of initial conditions. The trajectory was 1 ps in length with a time step of 0.5 fs, and half of the trajectory was taken as the equilibration phase. The NSK theorem provides the frequency resolution from the dataset of total length
ps to be
cm
−1, where
cm/ps is the speed of light, therefore upon applying the FFT procedure without zero-padding with frequency filtering we obtain quite coarse spectra. By applying the rLSSA approach to the same dataset with the requested frequency increment of
cm
−1, from the Equation
7 we get
and the more detailed spectrum (see
Figure 1). Note that the rLSSA procedure cannot provide new spectroscopic details; it just provides a smoother representation of the same dataset.
3.2. Cheapening Simulations by Using Large Integration Steps with Frequency Correction
The usual recommendation for computing the molecules’ vibrational spectra is to use as small time step
as possible. The reason for that is the artificial blue shift from the numerical integration error, which is especially prominent for the high-frequency vibrational bands [
50,
51]. However, it is possible to correct this behavior by using an analytical formula that compensates for such a shift. In the case of the Verlet algorithm and its equivalent methods (velocity Verlet and leapfrog) [
18], one needs to replace the observed frequencies from the FT (
) with the corrected ones (
) using the equation [
40,
52]
where
. This correction is applicable if the NSK-like limit
is satisfied, where
is the maximal vibrational frequency in the system. In the case of the higher-order methods, similar correction schemes can be derived; see an example in the
Appendix A.
To illustrate this artificial blue shift of vibrational bands and the effectiveness of the correction in Equation
8, we ran four sets of ten
-MD trajectories of methane (
) at GFN2-xTB level of theory with initial conditions sampled from the Maxwell-Boltzmann distribution at 300 K, each trajectory was 0.5 ps in length. The sets were differing by the time step
, namely with
fs. The mean vibrational spectra for each set of trajectories and their frequency-corrected counterparts are shown in
Figure 2. As one can see, the larger time steps lead to a noticeable shift of the C–H stretching band. In the reference spectrum with
fs such peak is located at approximately 3015 cm
−1, whilst at simulations with
fs this peak appears at approximately 3150 cm
−1. However, rescaling the frequency axis using Equation
8 restores the positions of the vibrational bands to their correct origin.
3.3. Simplified Wigner Sampling for Generating Initial Conditions
To include the NQEs in the MD simulations, we can use the approach of simplified Wigner sampling (SWS) introduced in Ref. [
14]. The sampling relies on generating simultaneously a displacement along the coordinate and momentum axis in a given direction. The displacements are generated using a Gaussian distribution with standard deviations[
14]
for coordinate and
for momentum. Here,
J
s is the Planck constant,
m is the mass of the nucleus,
J/K is the Boltzmann constant,
is a free parameter with the dimension of time, and
is the effective temperature defined as
where
is the temperature of the system.
The main problem of the SWS is the choice of the free parameter
. In work [
14], it was proposed to perform several MD simulations with different values of
and then take the parameter value that minimizes the total energy or the mean temperature of the simulation. In some sense, this approach relates to the variational principle, adjusting the underlying Wigner distribution to fit the PES as well as possible. However, such an approach is quite slow and inefficient. Thus, a cheaper criterion for the choice of
is required.
To provide an alternative criterion, we may use the harmonic approximation as the guiding principle, as single harmonic frequency calculation with analytical Hessian generally takes less computational time than running multiple MD trajectories. The total kinetic energy of the system is defined as [
18]
where
is the momentum vector of
k-th atom,
is the mass of the
k-th atom, and
is the total number of atoms in the system. In the case of the distribution sampled at
with SWS, the mean value of
is given as
(see Equation
10). This gives the total mean kinetic energy of
, while for a given degree of freedom, the mean kinetic energy is
. In the harmonic approximation, for a given
l-th mode (
, where
is the number of vibrational degrees of freedom) with vibrational frequency
, the kinetic energy is
[
53], and the mean kinetic energy over all of vibrational modes is
where
is the mean vibrational frequency of the system. By taking
, we arrive at the following expression
Equation
14 provides a criterion for the choice of
from the harmonic vibrational frequencies of the system. To show its applicability, we ran a set of
-MD calculations for molecules XH (
), XH
2 (
), XH
3 (
), and XH
4 (
) at the GFN2-xTB level of theory. The MD trajectories were obtained with 1 fs time step for 0.5 ps in total. The
for SWS initial conditions was scanned from 1 to 10 fs with 1 fs step, and the MD simulation for a given
contained ten individual trajectories. The optimal
was chosen as the one minimizing the mean total temperature of the MD simulation. The results are given in
Figure 3. As one can see, both criteria correlate with each other (Pearson correlation coefficient is 0.71). Therefore, we can apply the Equation
Figure 3 for other molecular systems.
3.4. Thermostats Incorporating Simplified Wigner Sampling
SWS approach, despite being quite crude, has its merits. Unlike the standard Wigner sampling, which can be used only at the beginning of the MD simulation, as the displacements are connected to the equilibrium reference structure, the SWS allows a re-sampling of the displacements during the simulation. This property makes the SWS compatible with canonical ensemble simulations, particularly with Andersen [
54] and Berendsen [
55] thermostats. Here, we will describe how to do it for both of these thermostats.
The canonical Andersen thermostat works by reassigning the velocity (or of one of the three components) of a randomly chosen atom according to the Maxwell-Boltzmann distributions at random points of MD simulation. The probability of reassignment is given as , where is the time of the reassignment. At each point of time, a random number is generated from the uniform distribution, and if condition is fulfilled, then the velocity reassignment procedure is initiated. Typically, should be greater than the characteristic periods of motions in the system, so the Andersen thermostat does not significantly disrupt the dynamics of the molecule. To combine the Andersen thermostat with the SWS, one has to replace the Maxwell-Boltzmann distribution resampling stage with the SWS routine, sampling both the momentum along given degrees of freedom and the displacement of the coordinate.
The Berendsen thermostat relies on the soft resampling of the velocities at each step of the MD simulation by multiplying them with a scale factor [
18,
55]
where
is the relaxation time, a free parameter of the Berendsen thermostat,
is the desired temperature of the MD simulation and
is the instant temperature of the nuclei in the MD simulations, defined as
with
being the total kinetic energy of the molecule at time
t, and
is the number of degrees of freedom (i.e.,
, where
is the number of atoms in the system and
is number of constraints). By replacing the
with the effective desired temperature according to Equation
11, we also make the Berendsen thermostat compatible with the SWS sampling.
3.5. Scaling of the vibrational spectra from the molecular dynamics
Scaling the harmonic vibrational frequencies and zero-point vibrational energies is a simple yet powerful tool for increasing the accuracy of IR spectra and thermodynamic properties calculations [
56]. In the case of vibrational frequencies, the multiplication by a precomputed scale factor for a given method accounts for the systematic errors due to the quantum-chemical approximation quality and the absence of the anharmonic effects [
56,
57]. Therefore, it seems like a reasonable suggestion to also allow for scaling of the anharmonic spectra from the MD simulations.
Here, we propose the procedure for obtaining the scale factors (
) for IR spectra from the MD and also provide a set of precomputed values for three quantum-chemical approximations: BLYP-D3(BJ)/6-31G, PBE-D3(BJ)/6-31G, and PBEh-3c. For this, we have taken six molecules: water (
), ammonia (
), methane (
), ethane (
), methylamine (
), and methanol (
). For each of these molecules at each of the quantum-chemical methods, three trajectories were computed using the combination of Andersen and Berendsen thermostats with
fs and
fs and the SWS sampling with the
taken according to criterion from Equation
14. Each trajectory was at
K with the time step of
fs and the total duration of 1 ps. The spectra were computed using the FFT procedure without ignoring the equilibration phase and with the application of the frequency correction from Equation
8. The experimental gas-phase IR spectra of the same molecular species were taken from the NIST Chemistry WebBook [
58].
It is not straightforward to compare theoretical spectra with a given frequency increment with their experimental counterparts since they have different frequency increments. Moreover, when the frequency axis is scaled for the spectra from the MD, the frequency increment will also change. Therefore, the following procedure was applied to determine the scale factor of the MD spectra for a given molecule. Let us assume that the MD spectrum is given as a set of
M points
and the experimental spectrum is a set of
N points
, where
and
are the
i-th frequency and intensity obtained with method “
X”. To remove the rotational substructure of the bands for small molecules, such as water or methane, the experimental data were also smoothed by a convolution with the Gaussian function. The frequency scale factor
is obtained by maximizing the following expression
where
and
are the vectors composed of intensity values of
N and
M dimensions each, and the
is the
matrix composed of elements
with
,
, and
with
being the frequency increments in the respective datasets. The scale factor
was searched for in the interval
.
Figure 4 shows an example of the result of the procedure described above. In this Figure, the unscaled and scaled spectra of methane obtained from MD simulations at the PBEh-3c level of theory are shown, as well as the corresponding raw and smoothed experimental data from the NIST Chemistry Webbook. As one can see, the scaling indeed improves the positions of the vibrational bands in the spectra. The scale factors obtained from all the training set molecules were averaged to produce the estimated scale factor for a given level of theory. The results are given in
Table 1. In the cases of the BLYP and PBE-based spectra, the ammonia had to be removed from the dataset as the most intense low-frequency vibrational band, corresponding to the valence band vibrations, was too far off from the experimental position.