In this section, we present the software and user-friendly graphical interface for performing each of the four different steps of the basic MDSPACE and MDTOMO workflow and discuss the places at which MDSPACE and MDTOMO differ.
3.2. Prepare simulation
This step allows preparing the input atomic model for MD simulations and calculating its normal modes, both used in the next step. The imported atomic model is first rigid-body aligned with the imported data to optimize the flexible fitting of this model to the data in the next step. To this goal, a 3D reconstruction is first calculated from the imported particle images (“3D reconstruction” box in the tree in
Figure 1B) or a subtomogram average is calculated from the imported subtomograms (“Average subtomogram” box in the tree in
Figure 1C). Then, the atomic model is rigid-body aligned with this 3D density map using ChimeraX (“Chimerax - Rigid Fit” box in
Figure 1B,C).
The topology model is then constructed, which should be suitable to the force field that will be chosen in the next step (all-atom CHARMM, all-atom Gō, or Cα-atom-based Gō). In our experience, Cα-atom-based Gō models produce satisfactory results at low computational costs. Therefore, the workflow proposes to construct a Cα-atom-based Gō topology model. Alternatively, the workflow may include constructing a CHARMM topology model before constructing a Gō model (“All-atom model” box before “C-Alpha Go model” box in Figures 1C and 2A), which can be useful with the structures for which SMOG has a difficulty to construct the Gō model directly and it works well when starting from a CHARMM model.
Then, this model is energy minimized, which is specified by selecting “Minimization” as the simulation type (
Figure 2B). All the parameters related to the simulation at this step (energy minimization) can be kept at their default values (the full documentation on the different simulation parameters can be found at the GENESIS website,
https://www.r-ccs.riken.jp/labs/cbrt). The results of the energy minimization (e.g., energy and structural variations during the energy minimization) can be checked by opening the corresponding viewer, by first selecting the corresponding box in the workflow (“Energy Min” box in
Figure 1B,C) and then pressing the red “Analyze Results” button (in the Scipion project window).
This step also includes NMA of the energy minimized structure to calculate normal modes, which will be used within NMMD simulations to analyze data in the next step. The NMA results viewer allows using VMD to observe the motions simulated along each normal mode and displaying the collectivities and frequencies of the normal modes. The NMA viewer can be open by selecting the corresponding box in the workflow (“Normal Mode Analysis” box in
Figure 1B,C) and pressing the red “Analyze Results” button.
3.3. Run MDSPACE/MDTOMO
This step allows data analysis using NMMD simulations started from the energy minimized model obtained in the previous step. The graphical interface for this step (
Figure 3A) is very similar to the graphical interface used for energy minimization in the previous step (
Figure 2B). The three main differences are as follows: (1) dataset to analyze should be specified in the “EM data” tab at this step (
Figure 3C), whereas “None” should be specified in this tab for energy minimization; (2) “Simulation type” in “Simulation” tab at this step should be set to “Normal Mode Molecular Dynamics (NMMD)” (
Figure 3A), whereas “Minimization” should be specified in this tab for energy minimization; and (3) availability of an additional tab (“MDSPACE Refinement” tab in
Figure 3D) at this step allows specifying the number of iterations of the conformational space refinement and the number of principal components of the conformational space that are kept at the end of each iteration and used in place of normal modes in the next iteration for this refinement. This step is the most important and time consuming step in the workflow. Therefore, we describe its parameters in more detail, in the order in which the corresponding tabs appear in the graphical interface that is shown in
Figure 3A.
Refinement: The set of parameters in this section allows specifying the number of iterations and the number of the PCA components for the iterative conformation-space refinement (the number of the principal components to keep after each iteration and use them to replace the normal mode vectors in the next iteration). In most cases, a few iterations (less than 4) and a few principal component vectors (3-5) are enough (
Figure 3D).
Inputs: This section allows selecting the initial model for the NMMD simulation. To select the energy minimized model obtained at Step 2, one can select “restart previous GENESIS simulation” and specify the available energy minimization results (
Figure 3D).
Simulation: This section allows choosing the type of simulation (among Minimization, MD simulation, NMMD, Replica Exchange MD, etc.) and its parameters. For this step of the workflow, we recommend choosing NMMD. If NMMD is chosen, this section allows defining the parameters related to MD simulation (“Simulation parameters“ section) and those related to the use of normal modes in the simulation (“NMMD parameters” section) (
Figure 3A). NMMD integrates over time atomic coordinates and normal-mode amplitudes, whereas classical MD simulations integrate atomic coordinates only. The numerical integration in NMMD is performed using the Velocity Verlet integrator, which has good numerical stability and is commonly used in classical MD-based approaches. Thus, if NMMD is chosen as the simulation type, the integrator in the “Simulation parameters“ section should be set to “Velocity Verlet” (
Figure 3A). The MD simulation parameters that may require adjustments for different datasets are the number of simulation steps and the time step (
Figure 3A). The “Time step” parameter value of 0.002 ps is suitable in many cases, but may need to be decreased (e.g., to 0.001 ps or 0.5 fs) for larger complexes to ensure the stability of the simulation. The number of steps of 20000 (“Number of steps” parameter in
Figure 3A) allows the simulation length of 40 ps, when using the time step of 0.002 ps. With some complexes, longer simulations may be required to reach the conformations that are present in the data (target conformations). To adjust these parameters, one may run Step 3 on a few images (or subtomograms) and check how the cross-correlation (CC), root mean square deviation (RMSD), and energy are changing during the simulation.
In the “NMMD parameters” section, the user needs to specify the normal modes that will be used. Note that the first 6 normal modes (6 lowest-frequency modes) are related to rigid-body motions and are not used. The use of the next 10 lowest-frequency normal modes (modes 7-16) will be enough in many cases, in particular with asymmetric structures. With symmetric structures, it might be necessary to use more than 10 modes to include all the modes that describe the same motion along different symmetry axes. In some cases, it may be useful to also include some of potentially relevant, higher-frequency motions. As mentioned above, these motions can be visualized and preselected at Step 2 using VMD. The computational cost of including a larger number of normal modes in NMMD simulations is negligible with respect to the computational cost of MD simulations. Thus, a larger number of normal modes can be included without a significant increase in the computational cost. The “NM time step” and “NM mass” parameters (
Figure 3A) define the speed of integrating the displacement along normal modes in NMMD. In general, the normal-mode time step parameter (“NM time step”) is the same as the MD simulation time step (“Time step”). The value of the “NM time step” parameter may be increased to accelerate the integration, but this can make the simulation unstable. The value of the “NM mass” parameter is usually between 5 and 10. Lower “NM mass” values accelerate the simulation, but can make it unstable. Usually, slower simulations are used for the analysis of subtomograms than for the analysis of single particle images, to avoid instability of the simulation during the data fitting due to the higher noise in the subtomogram data. The default values of “NM mass” and “Number of steps” in the proposed MDTOMO workflow template (“MDTOMO” box in
Figure 1C) are 10 and 50000, respectively, whereas they are respectively 5 and 20000 in the proposed MDSPACE workflow template (“MDSPACE” box in
Figure 1B). In both workflow templates, the default value of the “Time step” parameter is 0.002 ps. As already mentioned, these values may need to be modified in some cases of complexes, which can be done in preliminary experiments using a few images (or subtomograms).
MD parameters: This section defines other MD simulation parameters (
Figure 3B). The majority of the parameters in this section can be kept at their default values (the full documentation on the different simulation parameters can be found at the GENESIS website,
https://www.r-ccs.riken.jp/labs/cbrt). The value of the “Temperature” parameter is usually between 100 K and 300 K. To avoid instability of the simulation, the temperature can be decreased (e.g., to 50 K). The adjustment of the temperature should be done in preliminary experiments with a few images (or subtomograms).
EM data: This section allows specifying the data that will be analysed (by flexible fitting using NMMD simulations of the initial model) and the fitting parameters. The “Cryo-EM flexible fitting” field allows choosing the data type, which can be “Image(s)” or “Volume(s)” for analyzing single particle images or cryo electron subtomograms, respectively. Note that the selected data type in
Figure 3C is “Image(s)”, which is specific to the MDSPACE workflow template. In the case of MDTOMO workflow template, the “Cryo-EM flexible fitting” field is set to “Volume(s)”. The section allows defining two sets of parameters: “Image Parameters” and “Fitting parameters”. The “Image Parameters” section allows specifying the dataset to analyze (a set of single particle images or subtomograms, their initial rigid-body alignment parameters, and pixel/voxel size) (
Figure 3C). The “Fitting parameters” section allows setting the parameters related to the flexible fitting (biasing potential). The “Force constant” parameter (
Figure 3C) defines the weight that will be given to the biasing potential to guide the fitting towards the data, which should be chosen carefully. Too high values of the force constant will bias the fitting too fast and too much towards the data, which may lead to structural distortions due to noise in the data and potential overfitting. Too low values will not bias the fitting enough and the simulation may not reach the target conformation. Thus, due to the higher noise and the risk of the simulation instability and data overfitting when analysing subtomograms than when analysing single particle images, the default value of the force constant in the proposed MDTOMO workflow template (“MDTOMO” box in
Figure 1C) is 1000, whereas it is 3000 in the proposed MDSPACE workflow template (“MDSPACE” box in
Figure 1B). As for the parameters in the “Simulation” section (“Number of steps”, “Time step”, “NM time step”, and “NM mass”,
Figure 3A), the value of the force constant should be adjusted in preliminary experiments using a few images (or subtomograms), by checking the CC, RMSD, and energy over the simulation and the potential distortions of the fitted model (e.g., a too fast increase in the CC may be a sign that the force constant is too high). The other parameters in the “Fitting parameters” section can be kept at their default values. For instance, the “EM fit gaussian variance” parameter (
Figure 3C) defines the standard deviation of the 3D Gaussian functions that are placed at atomic positions to simulate the data for their comparison with the experimental data during the fitting (the comparison of images in the case of analysing single particle images or the comparison of density maps in the case of analysing subtomograms), and its default value (2 Å) will produce good results in the majority of cases.
MPI parallelization: This section defines how the simulations are distributed over the available resources. For most local machines, there is no need to change the default values of the parameters in this section (
Figure 3D) and one should only set the number of CPU cores and the number of threads (“Parallel” section in the top left corner, where the “MPI” parameter is the number of CPU cores and the “Threads” parameter is the number of threads per core,
Figure 3A). When running on clusters with multiple nodes, it is recommended to use “Running on cluster ?” (
Figure 3D) to efficiently distribute the simulations over different nodes.
Analysis of theresults of Step 3: The results of this step can be analysed by opening the viewer related to this step, by clicking first on the corresponding box in the workflow (“MDSPACE” or “MDTOMO” box in
Figure 1B,C), and then, on the red “Analyze Results” button. This viewer allows displaying statistical analysis of the energy, CC, normal mode amplitudes, and RMSD trajectories over a selected set of simulations (selected particle images or subtomograms in the “Simulation selection” field in
Figure 4). Also, for one selected particle image or subtomogram, it allows displaying the initial and final 3D structures with ChimeraX and animating the trajectory of atomic coordinates over the simulation with VMD (“Display results in Chimerax” and “Display trajectory in VMD” in
Figure 4).
3.4. Analyze conformational space
To analyze the conformational space populated by the models obtained in Step 3 (the models fitted to the data), these models can be projected onto a low-dimensional space using PCA or UMAP dimension reduction methods. Before PCA (“PCA” box in
Figure 1B,C) or UMAP (“UMAP” box in
Figure 1B,C), the models should be rigid-body aligned (e.g., with respect to the initial conformation) to discard the rigid-body motions introduced during the MD simulation (‘Rigid body align” box in
Figure 1B,C).
The “PCA / UMAP” results can be visualized and analyzed by opening the corresponding viewer, by first clicking on the “PCA” or “UMAP” box (
Figure 1B,C) and, then, on the red “Analyse Results” button. This viewer allows displaying the variance explained by the different PCA axes (
Figure 5), the conformational and free energy landscapes (in up to 3 dimensions) by specifying the PCA/UMAP axes to display (
Figure 5 and
Figure 6), atomic motion trajectories along different directions in this space (principal axes or a free-hand trajectories) by using “Open Animation Tool” (
Figure 6), and clustering the points in this space (
Figure 6) along the different directions automatically (clusters linearly distributed along a specified direction or obtained by K-means clustering) or by manual selection of points. The clusters can be exported into the Scipion project (
Figure 6) to calculate 3D average density maps from the clusters (3D reconstructions when analyzing images and subtomogram averages when analyzing subtomograms). The average density maps and the average atomic models obtained from the clusters can be visualized using the corresponding viewer (by first clicking on the box related to the exported clusters and, then, on the red “Analyse Results” button). This clusters-related viewer allows displaying ChimeraX animations of the trajectory of the average atomic models superposed with the trajectory of the average density maps. This animation can be saved in MP4 video file format via the ChimeraX command-line section.