3.1. Biosynthetic classes and network analysis for the BGCs from pathogenic Fusarium species
To maximize the identification of BGCs in the genomes of selected
Fusarium species, the annotation information of 35 pathogenic
Fusarium genomes was analysed for prediction using the antiSMASH tool. A total of 1733 putative BGCs were detected, with an average of 51 BGCs per species. Among these species,
F. haemophilus had the highest number of BGCs (65), whereas
F. kerogenes and
F. bacilli had the lowest number of BGCs (38) (
Figure 1,
Table S2). To investigate whether the number of BGCs present in different species correlated with their evolutionary relationships, an evolutionary tree based on single-copy orthologous genes was constructed, which revealed two distinct branches containing these species. Six species, including
F. decemcellulare, formed a smaller branch, while the remaining twenty-nine species were grouped in the other branch. However, there was no clear differentiation between the branches, nor was there any uniformity within them (
Figure 1). In terms of BGC types, each species had the largest number of NRPS BGCs, and most species contained dimethylallyltryptophan synthase (DMATs) labelled as indole, although the number of these DMATs did not exceed three (
Figure 1).
To gain a deeper understanding of these BGCs, a gene cluster family (GCF) network analysis was performed using BiG-SCAPE. Unfortunately, due to compatibility issues between BiG-SCAPE V1.15 and antiSMASH V7.0.0, a total of 112 RiPP-type BGCs and 26 other-type BGCs (NRP-metallophore and phosphonate) could not be identified and classified by the BiG-SCAPE pipeline. Nevertheless, BiG-SCAPE successfully classified 1640
Fusarium-derived BGCs and 1918 identified BGCs into 141 GCFs and 2047 individual clusters (
Figure S1) based on the similarity of predicted protein-coding domains. Within the GCF networks consisting of more than ten BGCs, several networks were observed that consisted exclusively of a single type of BGC. Specifically, twelve networks consisted exclusively of NRPS BGCs, nine networks consisted exclusively of terpene BGCs, five networks consisted exclusively of type I polyketide synthase (PKS) BGCs, and one network consisted of PKS_other BGCs (
Figure 2). Among the 141 GCF networks, the most complex mixed network was formed by seventy PKS_NRPS hybrid BGCs, sixteen type I PKS BGCs and two NRPS BGCs, giving a total of 19 identified hybrid BGCs. Based on these results, seventeen GCF networks were identified, including five type I PKS GCF networks (BIK_GCF, alt_GCF, DEP_BGC, ACTT/PKS19_GCF, and fsr_GCF), four NRPS GCF networks (APS_GCF, aba_GCF, san_GCF and chry_GCF), four terpene GCF networks (Ffsc4_GCF, SQS1_GCF, GA_GCF and tri_GCF), two PKS_NRPS hybrid GCF networks (ZEA_GCF and FSL_BGC) and two other GCF networks (has_GCF and fsd_GCF) (
Figure 2,
Table S3).
3.2. Terpene biosynthetic pathway of pathogenic Fusarium species
A total of 430 fundamental genes involved in terpene production were identified from a set of 1733 BGCs using antiSMASH. These genes encode enzymes such as sesquiterpene synthase, geranylgeranyl pyrophosphate (GGPP) cyclase, sesterterpene synthase, triterpene synthase, lycopene cyclase/phytoene synthase, as well as GGPP synthase for the production of diterpene scaffolds and the conventional pentenyltransferases (PT) and indole moiety-specific dimethylallyltryptophan synthase (DMATS). Among these genes, the largest number is associated with sesquiterpene synthases, accounting for almost half of the total, while the least number of genes belong to the sesterterpene synthase group, with only twelve sequences (
Figure S2).
A phylogenetic tree was constructed using 209 sesquiterpene synthases, and the results showed a strong clustering pattern. Among these clusters, seven identified sesquiterpene synthases provided strong evidence for the identification of related sequences. Ffsc4, a multi-product sesquiterpene cyclase from in the pathogenic fungus
F. fujikuroi, produces not only the 4/9 bicyclic 2-
epi-(
E)-β-caryophyllene and (
E)-β-caryophyllene, but also the 11-membered α-humulene [
34]. Ffsc4 has 29 homologous sequences found in 35 selected pathogenic
Fusarium species, and their sequence identities are higher than 69% (
Table S4). Therefore, it is speculated that the products of this cluster are identical or similar to these three sesquiterpenes. Ffsc6 is another multi-product sesquiterpene cyclase from
F. fujikuroi, which produces
α- and
β-cedrene,
α-acoradiene,
α-alaskene, and
β-bisabolene along with other sesquiterpenes [
34]. Ffsc6 has seventeen homologous sequences among these selected pathogenic
Fusarium species, and their sequence identities are higher than 59% (
Table S5). It is speculated that the products of this cluster are the same or similar to these five sesquiterpenes. STC5 and STC3 are the other two sesquiterpene cyclases from
F. fujikuroi [
35]. The former has 19 sequences with high sequence identities (>78%,
Table S6), and their products are probably the same as guaia-6,10(14)-diene, a 5/7 bicyclic sesquiterpene synthesized by STC5 [
35]. The latter, whose product is the bicyclic sesquiterpene eremophilene (A10) [
35], has two homologous sequences with identities as high as 81% (
Table S7).
CML1 is a sesquiterpene alcohol synthase found in
F. graminearum, a pathogenic fungus affecting cereal crops [
36]. CML1 is responsible for the biosynthesis of longiborneol [
37]. CML1 and six sesquiterpene synthases from pathogenic
Fusarium species form a cluster with more than 71% sequence identity (
Table S8). It is speculated that the products of this cluster are identical or similar to longiborneol. Trichodiene synthase, also known as TOX5 (TRI5), was initially identified in the pathogenic fungi
Gibberella pulicaris (
F. sambucinum) [
38]and
F. sporotrichioides [
39], and subsequently in
F. pseudograminearum [
40] and
F. graminearum [
41]. Members of the TRI5-containing cluster share more than 85% sequence identity (
Figure 3,
Table S9), and these putative BGCs contain TRI5 homologous sequences with high similarity (
Figure 4A). Trichodiene serves as the basic structure for various fungal sesquiterpene toxins, including DON, nivalenol (NIV), and T-2 toxin (
Figure 4B), which are produced by
Fusarium and
Stachybotrys species [
42]. FlvE, a terpene cyclase responsible for the synthesis of (1
R,4
R,5
S)- (+) -acoradiene in
Aspergillus flavus, has five homologues with more than 47% sequence identity in the selected
Fusarium species (
Table S10). Further comparison of the BGCs revealed that four genes in the
flvBGC share similarities with genes from
Fusarium-derived BGCs (
Figure S3).
In filamentous fungi, the synthesis of GGPP and its subsequent cyclisation is carried out by two different enzymes which work together to produce the diterpene backbone. In these selected
Fusarium species, the 87 enzymes involved in the synthesis of the diterpenoid skeleton have been classified into three groups, consisting of 40 GGPP synthases, 20 GGPP cyclases, and 27 GGPP-related cyclases respectively (
Figure S4). The gene
dpfgD, which is involved in the biosynthesis of the diterpenoid pyrone subglutinol A [
43], together with the three GGPP synthases, form a smaller subgroup within the GGPP synthase group with an identity of over 86% (
Table S11). On the other hand, the gene GGS [
44], together with the remaining 34 GGPP synthases, forms a larger subgroup with more than 81% identity (
Table S12). The gene CPS/KS [
16,
45], an
ent-kaurene synthase identified in
F. fujikuroi, plays a key role in the biosynthesis of gibberellin. The sequence identity between CP/SKS and the other members of the GGPP synthase group is over 41% (
Table S13). The terpene cyclase gene,
dpfgB, is responsible for the cyclisation of oxidized furanone diterpene [
43], and its 26 homologues have been identified in selected
Fusarium species, all of which share more than 57% sequence identity (
Table S14). Two diterpenoid synthesis gene clusters, GABGC and
dpfgBGC, have been discovered in
Fusarium and have served as lead examples in the search for several similar BGCs in the selected 35
Fusarium species (
Figure S4-S5).
Sesterterpenoids are a minority within the terpene family, and filamentous fungi are their primary producers. In
Fusarium species, these compounds are mainly derived from their pathogenic members. Fusaproliferin is a toxic compound found in the eggplant-disease causing pathogen
F. solani [
46]. Mangicdiene and variecoltetraene are products catalyzed by FgMS from
F. graminearum [
47,
48], while fusoxypenes A-C are products catalyzed by FgMS from
F. oxysporum [
49] (
Figure 5A). Six sesterterpene synthases were identified in the 35
Fusarium species, including two chimeric enzymes from FgMS in
F. graminearum [
47,
48] and FoFS in
F. oxysporum [
49]. The predicted three-dimensional structure based on Alphafold revealed that the four sesquiterpene synthases, as well as the two identified chimeric enzymes, possessed two relatively independent functional domains (
Figure S6). Cluster analysis of these twelve sequences divided them into three clades (
Figure 5B). The three uncharacterized sesquiterpene synthases are on an independent clade with up to 79% sequence identity among the three, while the amino acid sequence identity between these three and the two characterised sesquiterpene synthases is no more than 30% (
Figure 5C,
Table S15). FgMS and FGSG_01738 not only show highly identity in the primary sequences (
Figure 5C), but also show highly similarity with a 1.012 Å
RMSD in the three-dimensional structures (
Figure 5D). Spatial comparisons also revealed the similarity of the three uncharacterized sesterterpenes synthases, with RMASD values ranging from 0.366 Å to 2.259 Å (
Figure 5E).
Epoxysqualene cyclase utilizes 2,3(
S)-epoxysqualene as a substrate to synthesize the triterpene backbone with diverse structural characteristics. Among the selected
Fusarium species, ten lanosterol synthases and one squalene hopane cyclase (SHC), FDECE_14603, were identified (
Figure S2). The ten putative lanosterol synthases shared a significant identity (>49%) with Erg7 from
F. graminearum [
50] (
Table S16). FDECE_14603 displays 55% sequence identity with the identified SHC, Aafum, from the human pathogen
A. fumigatus [
51]. Carotenoids are a group of natural terpenoid pigments with a C40 backbone abundant in filamentous fungi. In
Fusarium fujikuroi, the complete carotenoid biosynthesis pathway has been elucidated (
Figure 6), with carRA being the crucial gene responsible for the synthesis of the C40 backbone [
52,
53]. Using
carRA as a reference, seventeen homologous genes were screened, and these genes exhibited 94% sequence identity (
Table S17). Further exploration led to the discovery of putative gene clusters in which these seventeen genes were located, and these clusters showed substantial similarity to the clustered portion of carotenoid biosynthetic genes. In addition, fifteen other enzymes associated with C40 backbone synthesis were identified (
Figure S7).
A total of 68 PTs (prenyltransferases) and DMATSs (dimethylallyltryptophan synthases) were screened, including DMATS1 previously identified in
F. fujikuroi [
54]. There are fifteen homologous sequences to DMATS1 in
Fusarium species, and their amino acid sequence identity exceeds 64%. It is speculated that these fifteen PTs, similar to DMATS1, are responsible for the trans-prenylation of the N1 position of the indole group (
Table S18). The other two sequences, FOYG_11805 and FACUT_12572, share 77.40% and 76.39% identity, respectively, with 7-DMATS from
A. fumigatus [
55]. It is speculated that these two sequences are also involved in the cis-prenylation of the C7 position of the indole group. Additionally, an uncharacterized PT, FsdK [
56], and its two homologues also exhibited a high degree of similarity, with an identity greater than 76%.
3.3. Nonribosomal peptide biosynthetic pathway of pathogenic Fusarium species
A total of 400 NRPSs were identified in 35 selected pathogenic
Fusarium species, including 22 identified NRPSs of
Fusarium species. The homologues of 22 identified NRPSs were screened based on the cluster analysis of the phylogenetic tree (
Figure 7).
NPS1 is an NRPS from
Histoplasma capsulatum involved in extracellular siderophore production [
57], and four of its paralogues were screened in
Fusarium, with 54% sequence identity. Further analysis of BGCs found that the gene cluster containing NPS1 was highly similar to six putative gene clusters derived from
Fusarium (
Figure S8). NPS2, the core enzyme of ferricrocin synthesis from
F. graminearum, has 32 homologous proteins [
58]. The comparison found that they have a highly consistent domain composition (
Figure S9). NPS6, another NRPS from
F. graminearum involved in iron carrier-mediated iron metabolism, also has 32 homologous proteins. The structural features of the remaining 30 homologous sequences are highly consistent with that of NPS6 [
59], except for the partial deletion of the structural domains of FVEG_15442 and FVEG15444 (
Figure S10). SidE is an NRPS from
A. fumigatus involved in siderophore-mediated iron metabolism [
60]. In these selected
Fusarium species, 19 homologous sequences most similar to SidE were screened out, and domain feature analysis found that the domain composition of these nineteen homologs was highly consistent. In contrast, the N-terminus of SidE lacks a C domain (
Figure S11). SidC is another NRPS from
A. fumigatus involved in iron carrier-mediated iron metabolism [
61], and the six homologous sequences most similar to it were screened for in these
Fusarium species. Domain analysis showed that the domain composition of NCS54_013740003, FDECE_13287, and FPRO_13979 is highly consistent with SidC, while the remaining three differ significantly from SidC (
Figure S12). ESYN1 from
F. oxysporum was shown to be responsible for the synthesis of cyclic depsipeptides enniatins [
62]. Five homologues of ESYN1 were found, and except for FGRM_2752 lacking a MT domain, the domain composition of the other four homologues was almost identical to ESYN1(
Figure S13). PesF (Afu3g12920) from
A. fumigatus is a putative ETP toxins synthetase, and five NRPSs with highly similar domain characteristics to AF_NRPS5 were screened (
Figure S14). HTS1 is a synthase of HC-toxin from
Cochliobolus Carbonum [
63]. In this selected
Fusarium, there is a putative NRPS, jgi.p_Fustri1_636519, which is relatively homologous to HTS1. Domain comparison revealed that jgi.p Fustri1_636519 lacked a T domain compared with HTS1 (
Figure S15).
The virulence factor beauvericin is synthesised by NRPS encoded by
bbBeas from
Beauveria bassiana [
64]. Related studies have identified two paralogous homologues of bbBeas in
Fusarium species,
fpBeas [
65] and
BEA1 [
66]. In this work, seventeen further homologues of BbBeas were identified, and BbBEAS and its 19 paralogous homologues formed distinct clusters in the evolutionary tree (
Figure 7). Comparative domain (
Figure 8A) and identity analysis (
Table S19) revealed high similarities between BbBEAS and 17
Fusarium-derived BEAS, except for partial amino acid deletions in FVEG_16703 and J7337_011263. Beauvericin is the product of a collaboration between BEAS and KIVR in which KIVR converts 2-ketoisovalerate from primary metabolism into one of the initial substrates of BEAS,
D-2-hydroxyisovalerate (
D-Hiv). Subsequently,
D-Hiv and another initial substrate,
L-phenylalanine (Phe), are catalysed by BEAS-related structural domains to form dipeptidol monomer and three dipeptidol monomers are esterified to form beauvericin [
64] (
Figure 8B). Comparison of BGCs revealed that
Fusarium-derived
Bea_BGCs are highly conserved and similar, with most being more abundant than
bbBea_BGC has one more ABC transporter-encoding gene upstream of kivr (
Figure S16).
The NRPS coding gene
aclP is the core gene responsible for aspirochlorine biosynthesis identified in
A. oryzae [
67], and its thirteen paralogous genes were screened in these
Fusarium species. The amino acid sequence identity comparison showed that the identity of AclP with the thirteen paralogous sequences was not more than 40%, and the thirteen paralogous sequences were all higher than 80% with each other (
Figure S17A). Domain comparison revealed high identities between AclP and its homologues, except for premature termination of some entries (
Figure S17B). It is speculated that these NRPSs with intact T-C-A-T-C domains, like AclP, catalyze the formation of
cyclo-(
L-Phe-
L-Phe) (
Figure S17C). Comparison of BGCs revealed similarities between several genes in the
aclBGC and related genes in the
Fusarium-derived BGCs (
Figure S18).
NRPS4 is the gene encoding the cyclic hexapeptide synthase (FGSG_02315) identified in
F. graminearum [
68], and NRPS4 shows higher than 70% identities with its nine paralogues (
Figure 9A) were screened in the present selection of
Fusarium. Further domain features revealed a high degree of identity between NRPS4 and the nine orthologues, with the exception of FPOAC1_014147 and FPOAC1_013344 (
Figure 9B). It is assumed that homologues such as FVRRES_02708, like NRPS4, is also able to synthesise the cyclic hexapeptide fusahexin using related amino acids as substrates (
Figure 9C). Comparison of BGCs containing these fusahexin synthases further revealed that they are highly similar (
Figure S19).
The genes encoding the virulence factor linear octapeptide fusaoctaxin A synthase,
nrps5 (FGSG_13878) and
nrps9 (FGSG_10990), were identified in the wheat pathogen
F. graminearum [
69]. Eight nrps9 paralogous homologs were screened in this work, and all nine NRPSs showed amino acid sequence identity above 65% (
Figure 10A). Structural domain analysis showed a high degree of identity in the structural domains of the remaining eight sequences, with the exception of over LC18013491(
Figure 10B). Further BGC comparisons showed that all eight
Fusarium species, including
Fusarium pseudograminearum, contained BGCs that were highly similar to
fg3_54 BGC (
Figure S20). Therefore, it is hypothesized that the biosynthetic pathway of fusaoctaxin A is commonly distributed in all nine
Fusarium species (
Figure 10C).
NRPS32 (FPSE_09183) and PKS40 (FPSE_09187) are two core genes identified in
F. pseudograminearum responsible for the biosynthesis of hybrid compound W493 B [
70]. Two paralogous genes were found for NRPS32, and their sequence identity exceeds 88% (
Figure 11A). Domain comparison indicated that the three sequences share identical domain organization (
Figure 11B), and further analysis revealed a high degree of similarity in the BGCs where they are located (
Figure 11C). Therefore, it can be inferred that these two putative BGCs are also involved in the biosynthesis of W493 B (
Figure 11D). Similarly, NRPS7 (FGSG_08209) and PKS6 (FGSG_08208) were identified as two core genes responsible for the biosynthesis of the hybird compound fusaristatin A in
F. graminearum [
71]. Five paralogous genes were identified for NRPS7, and their sequence identity exceeds 70% (
Figure 12A). Domain comparison demonstrated complete domain conservation among these genes (
Figure 12B), and further analysis revealed high similarity among the BGCs in which the core genes are located (
Figure S21). Therefore, it is speculated that these putative BGCs also have the ability to biosynthesize fusaristatin A (
Figure 12C).
NRPS30 (MAA_10043) is the core gene identified in
Metarhizium robertsii responsible for the cyclic pentapeptide, sansalvamide [
72] (
Figure S22A). Five paralogous genes of NRPS30 were identified in
Fusarium, and the sequence identity of NRPS30 with these five NRPSs was not more than 50%, whereas the five NRPSs had more than 70% sequence identity with each other (
Figure S22B). Domain analysis found that five homologues of NRPS30 lacked the fifth module relative to NRPS30 (
Figure S22C), and further comparison of BGCs showed that BGCs from
Fusarium also contained homologous genes of the P450 encoding gene (MAA_10043) (
Figure S22D). Chry1 (NRPS14, FGSG_11396), the NRPS responsible for the biosynthesis of the alkaloid chrysogines, was identified in
F. graminearum [
73], and the homologues of Chry1 were also screened in seven other
Fusarium species. Amino acid sequence identity analysis showed that these NRPSs shared more than 60% sequence identity with each other and domain analysis revealed a high degree of similarity in the composition of the domains of the NRPSs, except for the inappropriately annotated
F. poae origin NRPS. Further analysis revealed that
chryBGCs were present in multiple
Fusarium species and that these BGCs were highly similar (
Figure S23).
GRA1 (
NRPS8, FGSG_15673) is a core gene in the biosynthesis of the bicyclic toxic lipopeptides Gramillins in
F. graminearum [
74]. Two homologous genes of
GRA1, HYE67_007954 and FGFSG_11659, were found in
F. pose and another subspecies of
F. graminearum. Domain comparison showed that the domain composition of GRA1 and HYE67_007954 was highly consistent, while FGFSG_11659 lacked some domains compared to GRA1. BGC comparison shows that
GRABGC is highly similar to the BGCs within HYE67_007954 and FGFSG_11659, and it is speculated that these two BGCs may also produce similar cyclic peptide compounds. (
Figure S24).
Aps1 and
APF1 are core genes responsible for the synthesis of the cyclic tetrapeptides apicidin F and apicidin from
F. semitectum [
45] and
F. fujikuroi [
75], respectively. Their homologs, B0J16DRAFT_375847 and FPOAC1_013755, were screened in two other
Fusarium species, and the structural compositions of these four NRPSs were highly consistent. Further BGC comparisons showed that
ApsBGC and
APFBGC were highly similar to the two putative BGCs (
Figure S25).
The cyclic peptide FR901469 was identified as being synthesised by the NRPS FrbI (AN011243_029940) encoded by the unknown fungal species No. 11243 [
76], and several frbI-like genes were screened in
Fusarium strains. A comparison of the structural domains revealed that the NRPSs from the
Fusarium species lacked most of the domains compared to FrbI (
Figure S26A). NRPSs from three different
Fusarium strains, J7337_003370, FNAPI_7159 and FVRRES_13918, showed a certain degree of conservation with the PKS-NRPS1 (FFUJ_02219) [
57] from
F. fujikoroi. Comparison of the structural domains revealed that the amino acid sequences of these three NRPSs were similar to the NRPS modules of the hybrid enzyme PKS-NRPS1 (
Figure S26B). The NRPS FMAN_12219 from
F. mangiferae showed some similarity to another hybrid enzyme FUS1 (FFUJ_10058) derived from
F. fujikuroi [
77], and a comparison of the structural domains also revealed a high degree of correspondence between the amino acid sequence of FMAN_12219 and the NRPS module of the hybrid enzyme PKS-NRPS1 (
Figure S26C).
3.4. Polyketide biosynthetic pathway of pathogenic Fusarium species
In this comprehensive analysis of pathogenic
Fusarium, a total of 522 PKSs were identified in 35 carefully selected strains. These PKSs are vital enzymes involved in the synthesis of polyketide compounds, many of which play a crucial role in the virulence and pathogenicity of the fungal species. Through the utilization of phylogenetic tree clustering analysis, twenty-three distinct PKS clades were functionally identified (
Figure S27).
Gibepyrones are fungal toxins that have been isolated from the rice pathogen
F. fujikuroi, and their biosynthetic pathway has also been elucidated in
F. fujikuroi [
78]. The PKS-encoding gene GPY1 is considered to be the core gene involved in the biosynthesis of gibepyrones, especially gibepyrone A. Notably, the cluster analysis revealed the widespread presence of GPY1 homologs among the 35 selected pathogenic
Fusarium species. Amino acid sequence analysis revealed that GPY1 shares more than 75% sequence identity with its homologs (
Figure 13A), and further investigation of the domain composition of these PKSs indicates high conservation (
Figure 13B). The putative BGCs containing GPY1 homologs are commonly found in
Fusarium species, with striking similarities to the
GPYBGC (
Figure S28). Fusarubins, another class of polyketides, have also been isolated from
F. fujikuroi, and their biosynthetic pathway has been identified in this fungal strain as well [
79]. The BGC responsible for fusarubins consists of six genes, with
fsr1 identified as the core gene encoding for the creation of the fusarubins skeleton, specifically 6-
O-methylfusarubin [
79]. The homologs of
fsr1 were found to exist in all 35 selected pathogenic
Fusarium species, and their amino acid sequences exhibited a remarkable level of identity exceeding 75% (
Figure 14A). Additionally, the analysis of domain compositions indicated high similarity (
Figure 14B). The
fsr-like BGCs were found to be widely present across pathogenic
Fusarium species (
Figure S29).
Fusaric acid, a notorious mycotoxin known to cause extensive damage to plants, has also been the focus of BGC identified in various
Fusarium species [
11], including
F. fujikuroi. The
FUB1 gene, which encodes a highly reductive PKS, is considered a key gene within the
FUBBGC [
80]. Homologues of FUB1 were found to have a wide distribution in over twenty
Fusarium species, displaying up to 93% sequence identity at the amino acid level (
Figure 15A) and highly conserved domain structures (
Figure 15B). Further analysis revealed the presence of the predicted
FUBBGC in several pathogenic
Fusarium species, with significant similarity to previously identified
FUBBGCs (
Figure S30).
Bikaverin, a strikingly pigmented compound, was initially identified in cultures of
F. lycopersici and
F. vasinfectum [
81]. The BGC responsible for bikaverins was discovered in
F. fujikuroi [
82]. The initiation of
bikaverin biosynthesis is mediated by the PKS-encoding gene
bik1, and Bik1 utilizes acetyl-coenzyme A and malonyl-coenzyme A to produce the bicyclic precursor of bikaverin, referred to as pre-bikaverin. Nineteen homologues of Bik1 were identified, with amino acid sequence identities exceeding 81% (
Figure 16A). Analysis of the domain composition revealed striking similarities between Bik1 and its homologs (
Figure 16B). Further investigation uncovered the widespread presence of predicted bikBGs in pathogenic
Fusarium species, which exhibited high similarity to the identified
bikBGC (
Figure S31).
FmFPY1 (
FmPKS40), a key gene involved in the biosynthesis of fusapyrone and deoxyfusapyrone [
70], has been found to have fifteen homologues. The sequence identities between FmFPY1 and its homologues surpass 81%. Despite the absence of the C-terminal domain in J7337_000001 and FGLOB1_11207, the other PKSs share remarkably similar domain compositions. Putative BGCs containing
FmFPY1 homologs have been identified in multiple pathogenic
Fusarium species, which show high similarity. (
Figure S32).
The gene
fogA, derived from
A. ruber, serves as the core gene responsible for flavoglaucin biosynthesis [
83]. Several homologues of FogA have been discovered in pathogenic
Fusarium species. The sequence identity between FogA and its homologues exceeds 50%, while the
Fusarium-derived homologues show more than 90% identities (
Figure 17A). A comparison between
fogBGC and putative BGCs containing
fogA homologues from
Fusarium indicates some similarity, whereas the
Fusarium-derived putative BGCs show high similarity (
Figure 17B). Furthermore, putative
fogBGCs were discovered in twelve pathogenic
Fusarium species, which exhibit high similarity to the
fogBGC (
Figure S33). SdnO, a PKS identified in the BGC responsible for sordarin in
Sordaria araneosa, plays a crucial role in the synthesis of the glycolipid sidechain of the sordarin structure [
84]. Screening of pathogenic
Fusarium species led to the discovery of five homologs to SdnO (
Figure S34A). Although the identity between SdnO and these homologues does not exceed 40%, the identities among these homologs themselves surpass 60%. Further comparisons revealed a remarkable similarity in domain features between four of these homologues and SdnO (
Figure S34B).
YWA1 serves as an intermediary compound in the biosynthesis of aurofusarin, a pigment toxin found in
F. graminearum [
2]. The biosynthesis of aurofusarin is initiated by PKS12 [
85], which is encoded by the
fusBGC. Through sequence analysis, eight PKS12 homologues with a sequence identity exceeding 75% were identified (
Figure 18A). Additionally, these homologues exhibited significant domain similarity (
Figure 18B). Examination of predicted BGCs containing PKS12 revealed their presence in pathogenic
Fusarium species, further highlighting their similarity to the
fusBGC (
Figure 18C). Hence, it can be inferred that aurofusarin is a commonly produced pigment toxin in these fungi. Depudecin, a linear polyketide with eleven carbon atoms, was isolated from the pathogenic fungus
Alternaria brassicicola [
86]. The core gene responsible for the biosynthesis of depudecin is
DEP5 [
86]. Screening identified nine homologous sequences of DEP5 that share more than 65% sequence identity and have similar domain compositions. This suggests a conserved function across these homologues. Furthermore, putative
DEPBGCs were discovered in eight pathogenic
Fusarium species, and they are highly similar to the
DEPBGC (
Figure S35).
PKS6 is considered one of the pivotal genes involved in the biosynthesis of the cyclic peptide fusaristatin A [
71]. This gene operates in conjunction with another core gene, NRPS7, to synthesize fusaristatin A (
Figure 12). Five highly similar homologues of PKS6 have been identified based on both amino acid sequence identity and domain composition (
Figure S36). Similarly, PKS40 and NRPS32 form another pair of synergistic core genes responsible for the production of W493 B [
71] (
Figure 11), and two homologues of PKS40 with identical domain structures were identified (
Figure S37). Furthermore, in
F. fujikuroi,
PKS19 serves as the core gene for the biosynthesis of α-pyrones (fujikurins). Three homologues of PKS19 have been identified in pathogenic
Fusarium species, and their domain features closely resemble each other (
Figure 19A). Additionally, the comparison of BGCs revealed a remarkable similarty between the presumed BGCs containing PKS19 homologues and the fujikurins BGC (
Figure 19B).
Alt5 has been recognized as the core gene responsible for the biosynthesis of alternapyrone in
A. solani [
87]. Four PKSs were identified that shared more than 70% identity with Alt5. These PKSs exhibited consistent domain features, and their corresponding BGCs displayed a notably high similarity (
Figure S38). In addition, the core gene
sol1, which is responsible for the synthesis of Solanapyrone, was identified in
A. solani [
88]. Three homologues of sol1 were screened in
Fusarium species, and these four PKSs exhibited very similar domain characteristics (
Figure S39).
DpfgA, identified in
F. graminearum, funcations as a core gene responsible for the polyketone part of Subglutinols biosynthesis [
43]. Through screening, four homologues of DpfgA with highly similar domain features were identified in other pathogenic
Fusarium species (
Figure S40).
FSL1, the core gene responsible for fusarielin biosynthesis, was also identified in
F. graminearum [
89]. It cooperates with FSL5 to complete the backbone synthesis of fusarielins. Five homologs of FSL1 were screened, and they displayed a high degree of similarity in their domain compositions (
Figure S41A). Moreover, predicted
FSLBGCs were discovered in the corresponding strains, exhibiting considerable similarity to the
FSLBGC (
Figure S41B). Another pair of genes,
bet1 and
bet3, function collaboratively to form a polyketone skeleton [
90], with Bet1 belonging to the type I HR PKS. A
bet-like BGC was identified in
F. decemcellulare, wherein three genes displayed significant homology with
bet1,
bet3, and
bet4, respectively (
Figure S42). The core gene
G433, responsible for the synthesis of 1233A, was identified in
Fusarium sp. RK97-94 [
91,
92]. Through screening, four homologues of G433 were identified in 35 selected pathogenic
Fusarium species. G433 and these homologues show high similarity in domain composition, and the corresponding BGCs display a significant homology (
Figure S43).
The gene
FUM1, which encodes the PKS involved in fumonisin synthesis, is considered to be the key gene in this process [
93,
94]. Six homologues of FUM1 have been identified by screening. FUM1 and its homologues share not only a high degree of similarity in their amino acid sequences (
Figure S44A), but also a close resemblance in the composition of their domains (
Figure S44B). Putative
FUMBGCs were identified in related species and showed striking similarity to
FUMBGCs (
Figure S44C). In the pathogenic
G. zeae, two core genes,
zea1 (pks13) and
zea2 (pks4), were identified as essential for zearalenone synthesis [
95,
96]. These two genes work together to produce the linear backbone structure of zearalenone. Several putative
zeaBGCs have been identified in pathogenic
Fusarium species, and their core enzymes, which are homologs of Zea1 and Zea2, showed remarkable domain similarities to Zea1 and Zea2 (
Figure S45). In addition, a pair of synergistic PKS-encoding genes, pkhA and pkhB, were identified as the core genes responsible for alternariol biosynthesis [
97]. Several putative
phkBGCs were screened in pathogenic
Fusarium species, and their core enzymes, which are homologs of PkhA and PkhB, showed high structural similarity to PkhA and PkhB (
Figure S46).
3.5. PKS-NRPS biosynthetic pathway of pathogenic Fusarium species
In the field of mycology, the co-localization of PKS and NRPS genes in fungi leads to the formation of PKS-NPS hybrid enzymes. These enzymes consist of both PKS units, which contain various domains such as KS, AT, DH, ME, KR, and ACP, as well as NRPS units, which consist of A, T, and C domains. Within this complex enzyme system, the PKS units primarily mediate reactions involved in elongating carbon chains, while the NRPS units utilize the A domain to selectively activate specific amino acids and load the resulting aminoacyl residues onto the T domain. Once the entire polyketide chain assembly is complete, the C domain facilitates the fusion of the polyketide chain with the activated amino acid residues, ultimately resulting in the production of amide-derived compounds. The first characterized PKS-NRPS was discovered in the genus
Fusarium, and thus far, a total of six PKS-NRPSs have been deciphered from different
Fusarium species. Evolutionary analysis of 88 PKS-NRPSs screened from thirty-five pathogenic
Fusarium species and five identified PKS-NRPSs-derived from non-
Fusarium species allowed them to form distinct clusters (
Figure 20).
One notable example of a PKS-NRPS hybrid phytotoxin is Fusarin C, which was identified in maize infected with the plant pathogenic fungus
F. moniliforme back in 1981 [
98]. The core genes responsible for the biosynthesis of fusarin C, FusA or Fus1, were subsequently identified in
F. moniliforme [
99] and
F. fujikuroi [
77], respectively. Interestingly, eighteen homologues of FusA and Fus1 have been found in other pathogenic
Fusarium species, and these twenty sequences make up the largest cluster of the PKS-NRPS collection. These hybrid enzymes share more than 70% of the amino acid sequence with each other (
Figure 21A). Apart from five sequences that contain additional ER domain, the domain composition of the remaining sequences is consistent with that of FusA and Fus1(
Figure 21B). It is hypothesized that these homologous sequences, like FusA and Fus1, synthesize pre-Fusarin C with high homoserine, malonyl-CoA and S-adenosyl-
L-methionine (SAM) as substrates (
Figure 21C). Furthermore, putative
FusBGCs have been discovered in eighteen additional pathogenic
Fusarium species, which show significant similarity to the
FusBGCs associated with the biosynthesis of Fusarin C (
Figure S46). Another related compound, lucilactaene, which is a structural analogue of Fusarin C, has been isolated from
Fusarium sp. RK97-94. The core gene responsible for lucilactaene biosynthesis,
luc5 [
92], has been found in four homologues in pathogenic
Fusarium species, and these five sequences exhibit over 92% identity (
Figure 22A). Not only do their domain features bear a significant similarity with Luc5(
Figure 22B), but the BGCs containing PKS-NRPSs in these species also show high similarity to the
lucBGC (
Figure S47). Presumably, these homologous sequences, like Luc5, synthesise analogues of pre-Fusarin C with the same substrates as FusA and Fus1 (
Figure 22C).
Through the application of cluster analysis, we have discovered eighteen novel PKS-NRRSs that form the second-largest clade within the hybrid collection. These newly identified sequences display divergence from previously characterized PKS-NRRSs, as evidenced by their sharing of less than 40% sequence similarity and identity. However, there is a striking intra-sequential congruence, with identity values reaching up to 80% (
Figure S48A). Structural alignment reveals an almost perfect homology in terms of domain composition (KS-AT-DH-MT-KR-ACP-C-A-T-R-ER) among these newly identified sequences (
Figure S48B). Similarly, the putative BGCs in which these PKS-NRRSs serve as core genes also exhibit a significant similarity (
Figure S48B). Based on these findings, we propose that these newly discovered PKS-NRRSs may represent a previously unknown class of hybrid enzymes. Additionally, it is conceivable that the BGCs containing these hybrid enzymes may play a vital role in the synthesis of novel fungal toxins.
Sambutoxin, a mycotoxin, was initially discovered in the potato pathogen
F. sambucinum [
100]. The core gene responsible for the biosynthesis of sambutoxin, known as
smbB, was identified in
F. commune [
25]. Fourteen sequences similar to SmbB have been found in other pathogenic
Fusarium species, with over 75% similarity in their amino acid sequence (
Figure 23A). Further analysis of the structure shows a high consistency in the domain features among these sequences (
Figure 23B), suggesting that these SmbB homologs use phenylalanine, acetyl-CoA, malonyl-CoA, and SAM as substrates to synthesize a hybrid scaffold, which serves as the precursor for mycotoxins (
Figure 23C). The putative samBGCs were identified in the corresponding fifteen pathogenic
Fusarium species, and these putative BGCs share a significant resemblance to
smbBGCs (
Figure S49).
Equisetin and trichosetin, naturally occurring tetramic acids derived from PKS-NRPS, are phytotoxic and exhibit cytotoxic effects. These compounds are produced by the pathogenic
Fusarium. The core genes involved in their biosynthesis,
fsa1 [
101],
eqiS [
24], and
FFUJ_02219 [
102], have been identified in
Fusarium sp. FN080326,
F. heterosporum, and
F. fujikuroi, respectively. A total of fifteen sequences similar to equisetin synthetase were identified in pathogenic
Fusarium species, and these eighteen sequences share more than 75% sequence identity among themselves (
Figure 24A). Domain analysis indicates that these eighteen sequences have highly similar structural features (
Figure 24B). Based on the known equisetin synthetases, it is suggested that these hybrid enzymes also employ serine and coenzyme A derivatives in the synthesis of equisetins compounds (
Figure 24C). The putative BGCs for equisetins were identified in the fifteen respective pathogenic
Fusarium species, and these BGCs showed a high level of similarity to each other (
Figure S50).
The core gene responsible for the biosynthesis of ilicicolin H in
Penicillium variabile,
iccA, is a hybrid gene consisting of multiple modules [
103]. IccA and IccB work together to create the hybrid scaffold [
103]. Six homologues of IccA have been identified in pathogenic
Fusarium species. The amino acid sequence identity between IccA and these six
Fusarium-derived PKS-NRPSs ranges from 50% to 60%, while the identity among the six
Fusarium-derived PKS-NRPSs themselves exceeds 70% (
Figure 24A). The alignment of their domains shows a high congruence between the domain composition of IccA and those of the six
Fusarium-derived PKS-NRPSs (
Figure 24B). Therefore, it is hypothesized that these six
Fusarium-derived PKS-NRPSs, similar to IccA, utilize tyrosine, SAM, and coenzyme As to synthesize tetramic acid intermediates (
Figure 24C). Based on the predictions from antiSMASH, putative
iccBGCs have been identified in the corresponding six pathogenic
Fusarium species, and these BGCs show remarkable similarity (
Figure S51).
The PKS-NRRS FsdS, derived from
F. heterosporum [
56], consists of ten domains (KS-AT-DH-MT-KR-ACP-C-A-T-R), where the A domain is responsible for the activation of
L-tyrosine [
56]. Comparison of the amino acid sequences of FDECE_13779 and FGRMN_3691, two PKS-NRSs from pathogenic
Fusarium species, showed that they share a significant 72% sequence identity with FsdS (
Figure S52A). Domain alignment of these three sequences indicated that the structural features were indeed identical (
Figure S52B). Further comparisons carried out on BGCs revealed a significant similarity between the BGCs containing FDECE_13779 and FGRMN_3691 and
fsdBGC (
Figure S52C). In addition, ACE1 is a key gene involved in the biosynthesis of an avirulence signalling compound in the rice pathogen
Magnaporthe oryzae [
104]. Notably, ACE1 also shows similarity to two other PKS NRRSs from pathogenic
Fusarium species, namely FDECE_13779 and FGRMN_3691.
ACEBGC showed some similarity to the BGCs containing FDECE_13779 and FGRMN_3691, with both the core gene (
Figure S53A) and related functional genes showing high homology (
Figure S53B). Furthermore, the
thnA gene identified in
Trichoderma harzianum serves as a core gene for the synthesis of trihazones [
105]. Interestingly, there is a 69% amino acid sequence identity between ThnA and FSARRC_13765 from
F. Sarcochroum. Not only do FSARRC_13765 and ThnA share highly similar domain compositions (
Figure S54A), but they also show considerable similarity within the BGCs in which they are located (
Figure S54B). The PKS-NRPS encoding gene,
chgG, was identified in
Chaetomium globosum [
106]. CghG shares more than 56% amino acid sequence identity with FMUND_12554, and there is some similarity in their domain compositions (
Figure S55). Finally, two novel PKS-NRSs have been identified in
Fusarium, namely LCI18_013989 and jgi.p_Fustri_620762. The structural composition of LCI18_013989 consists of the domains KS-AT-DH-MT-KR-ACP-C-A-T-R, whereas jgi.p_Fustri_620762 contains the domains KS-AT-DH-MT-KR-ACP-C(
Figure S56). To better understand their functions, these novel PKS-NRSs require further investigation by heterologous expression.