1. Introduction
Fast molecular docking methods are widely used tools of drug design and molecular engineering [
1]. The docking procedure aims to search the target-ligand conformational space at a reasonable computational cost to predict the most probable binding mode (position, orientation, and conformation) of ligands in the binding pocket of a target [
2,
3]. While docking is undoubtedly a leading technique, it still faces persistent challenges [
4,
5,
6,
7,
8,
9] especially when it comes to large, flexible peptide ligands. However, peptides mediate up to 40% of naturally occurring protein-protein interactions and play a central role in various cellular processes, including signal transduction, transcriptional regulation, immune response, and oncology [
10,
11,
12,
13,
14]. Structural models of peptide–protein complexes have been used to design inhibitory peptides and peptidomimetics that modulate protein–protein interactions involved in various disease pathways [
14,
15,
16,
17,
18,
19,
20]. In addition to their high specificity and relatively low toxicity [
21,
22,
23,
24], peptides were able to successfully target protein complexes such as transcription factors, which were considered undruggable by small molecules due to their huge structures and stable state [
25,
26]. Thus, the solution to the peptide docking problem would remarkably foster many drug development projects.
The problem of peptide docking originates from three main roots. Firstly, peptide-mediated interactions are often transient and their strength is typically weaker than that of protein-protein interactions since peptides bind to large, mostly shallow binding pockets having high structural flexibility before complexation [
11,
27]. Secondly, peptides have relatively large sizes and high conformational flexibility that require global search. Thirdly, peptides have many hydrophilic regions, and therefore, they are extensively hydrated which further complicates the docking process [
28,
29,
30,
31]. Notably, the elucidation of the role of hydration and water networks is a general problem in drug design that is not restricted to peptide ligands [
32,
33,
34,
35,
36,
37,
38,
39,
40]. However, incorporating explicit water molecules in molecular docking is rather challenging [
41,
42,
43,
44]. Large complexes such as adenosine A2A [
45] and histone proteins [
46] were shown to be challenging due to their extensive water-mediated H-bond network with their ligands, however, only a few refinement methods are designed to incorporate experimental [
47] or predicted [
41] water molecules in their protocol to assist formation of accurate mediated interactions during simulations.
Unfortunately, the above-mentioned problems of peptide docking seem to persist even in the cases of the latest methodologies based on artificial intelligence. The recent release of DeepMind’s AlphaFold2 (AF2) [
48,
49] and AlphaFold-Multimer (AFM2) [
50,
51] has brought the accuracy of the computational modeling of proteins to another level. Several studies have shown that both AFM2 and the input-manipulated versions of AF2 are able to predict protein-peptide complexes with high accuracy [
51,
52,
53,
54]. However, they have several major limitations including their (i) protein-only predictions, excluding cofactors, ions, or any post-translational modifications, (ii) inconsistency in the prediction quality of secondary structures and other local conformations due to their over- and under-representations during the training process [
52,
53], (iii) complete neglect of the effect critical water molecules at the binding interface, (iv) decreased prediction accuracy for protein side-chains [
54] and (v) inadequate modeling of conformational flexibility [
49,
55] which is crucial for modeling ligand binding with induced fit. A recent study also showed that despite the excellent structural agreement of their predicted ligand bound conformation to the experimental one, deep learning-based docking methods often produce physically implausible structures [
56] and can be outperformed by standard, physics-based docking methods.
Thus, physics-based refinement, like molecular dynamics (MD) simulations can help to fix the problems of structures generated by deep learning or other knowledge-based methods [
57,
58,
59]. There have been several attempts to improve the quality of AF2 and AFM2 models using MD simulations [
57,
58] or applying their own recycling process when the models are used as custom template inputs [
60]. The recycling route of AF2 passes the partially completed proto-model through the deep neural networks repeatedly (four times by default) [
49]. Although recycling significantly improved the model quality in most cases, unrealistic atomic positions were observed [
60] in the recycled models due to their unrelaxed nature. Therefore, they also suggested the combination of MD protocols and the AF2 recycling process to improve models for further applications, such as drug discovery.
Similar, post-docking refinement steps have been also introduced in many docking protocols (
Table S1). A refinement step prior to ranking could introduce structural flexibility and improve the energetics of the interface for proper scoring [
61,
62]. Refinements can range from short energy minimizations [
63,
64] removing steric clashes to more sophisticated methods that allow binding site flexibility upon ligand binding using MD [
41,
47,
65,
66,
67,
68,
69,
70,
71,
72] or Monte-Carlo simulations [
73,
74]. Refinement protocols or standalone tools often include energy minimization and much longer MD simulations in nanoseconds accompanied by various optimized parameters of the simulation [
41,
47,
68,
70,
71]. The docking scores are often used for ranking the refined structures accompanied by structural clustering simulation [
41,
47,
63,
64,
68,
70,
71,
73,
74,
75]. As MD-based protocols can effectively incorporate the effects of explicit water molecules and the flexibility of both protein target and ligand [
75,
76], several studies have reported the use of MD simulations for improvement of the docked poses of various ligands [
41,
69,
70,
77,
78,
79] including peptides [
47,
65,
67,
68,
69,
70,
71,
75,
80].
While MD has proved useful in the above-mentioned structural refinements, an ultimate protocol has not been published that may be appropriate for any peptide ligands. The large variability of peptide ligands necessitates the systematic investigation of various MD parameters like simulation length, temperature, water model, restraints, clustering, etc. In this study, we focused on histone H3 peptides in complexes with their reader proteins that have crucial therapeutic, as well as diagnostic importance in various cancer and other diseases [
81,
82,
83,
84,
85,
86,
87,
88,
89,
90,
91,
92,
93]. At the same time, histones are particularly challenging ligands for molecular docking as they often interact with shallow binding pockets on the reader proteins with weak interactions measured in micromolar binding constants [
94]. Moreover, their large conformational flexibility [
24,
95,
96,
97,
98,
99] and extensive hydration in such shallow binding pockets further complicates the prediction of accurate binding modes [
24,
29,
99,
100].