1. Introduction
Hydrocarbon pollution in the environment occurs through human activities or accidentally, resulting in environmental contamination [
1]. Exposure to hydrocarbons can lead to a wide range of health conditions, increasing morbidity and affecting human health [
2]. Hydrocarbons are classified into two groups: aliphatic and aromatic. The term "aliphatic hydrocarbon" refers to saturated compounds, such as alkanes or paraffins, which do not have double or triple bonds. "Unsaturated aliphatic hydrocarbons" have one or more double bonds (alkenes) or one or more triple bonds (alkynes). "Aromatic hydrocarbons" exhibit properties associated with the benzene ring and are known as polycyclic aromatic hydrocarbons (PAHs) [
3]. Due to rapid industrialization and large-scale anthropogenic activities, the level of hydrocarbon pollution continues to increase, which is concerning for the environment. There are many techniques for hydrocarbon remediation, but the use of microorganisms has several advantages, such as cost-effectiveness, minimal or no by-products, and more [
4].
Microbes are primarily used for their rapid growth and ease of manipulation [
5]. The phyla Proteobacteria and Bacteroidetes, which are generally gram-negative bacteria, have potential for hydrocarbon degradation [
6]. Proteobacteria constitute one of the largest and most versatile phyla in the bacterial domain. Single bacteria or consortia of Proteobacteria have been utilized for hydrocarbon degradation, demonstrating their effectiveness in remediating these compounds [
7,
8]. In the last decade, several microorganisms capable of degrading PAHs have been identified and characterized. Additionally, metabolic enzymes have been isolated from microorganisms for degrading various PAH compounds, identifying diverse novel metabolic pathways, including studies on PAH catabolic operons [
9].
In this study, de novo genome sequencing was performed on an enterobacterium isolated from soil samples of the rhizosphere of tara plant (Caesalpinia espinosa) from the Lomas de Lachay National Reserve. Genes involved in the degradation of aromatic hydrocarbons were identified. In addition, a proteome prediction was conducted followed by the construction of three-dimensional molecular structures of putative proteins involved in aromatic hydrocarbon bioremediation. This involved deep learning modeling for molecular docking with the benzopyrene molecule to assess their potential interaction against this compound. To the best of our knowledge, this work provides the first report on bacteria isolated from the Lomas de Lachay reserve and on the genomics and molecular biology of Enterobacter sp. UNJFSC003 in response to polycyclic hydrocarbons.
3. Results
3.1. Isolation of the Strain and Genome Characteristics of Enterobacter sp. UNJFSC 003
The pure strain was successfully isolated on TSA culture medium. Additionally, the colony exhibited a transparent color and a star-shaped form. The Gram stain revealed that it was a Gram-negative bacterium. The biochemical profile included a growth assay [
45] on MacConkey and eosin methylene blue (EMB) media, fermentation of glucose (+), gas production (+), oxidase negative (-), catalase positive (+), and motility test (+).
The total number of raw reads was 10,914,547 sequences, with an average length of 151 base pairs (bp), a GC content of 54%, and a total sequencing output of 1.6 gigabytes (GB). The results of the cut made can be seen in
Table 1.
The de novo assembly was conducted with the unicycler program because it can assemble Illumina read assemblies by working as a SPAdes optimizer, obtaining better results from N50 and longer contings. We tested different k-mers (from 27 to 127). In the assembly quality assessment (Quast), we obtained statistical results of 51 contigs (>= 0 bp) with a GC content of 54.38%. The longest contig obtained was 4,663,955 base pairs (>= 50,000 bp). Additionally, the N50 result was 415,254 and the L50 was 3. In the taxonomic classification analysis of the Enterobacter sp. UNJFSC003 genome assembly conducted by MiGA, the results indicate that our strain belongs to the family Enterobacteriaceae with a p-value of 0.0025. It is likely to belong to the genus Enterobacter (p-value: 0.02) and possibly to the species E. asburiae NZ CP065693 with an average nucleotide identity (ANI) of 87.85%. Additionally, MiGA provided other results such as the consistency and quality of essential genes, with a genome quality value of 93.6%, completeness of 98.1%, and contamination of 0.9% (
Table 2). Finally, in the genome annotation, the following results were obtained: a total genome size of 4,798,267 base pairs (bp), 4,460 protein-coding sequences, 2 rRNAs, 77 tRNAs, and 1 tmRNA. Additionally, functional analysis of genes and orthologous proteins yielded a total of 2,671 orthologous proteins and 1,487 KEGG-annotated metabolic proteins obtained through prior analysis using KOALA-KEGG. Other results include 4,349 orthologous gene clusters, 4,186 Pfam annotations, 2,540 gene ontology assignments, 77 carbohydrate-active enzymes, and reconstruction of metabolic networks (BiGG) with a total of 1,064 nodes. These results were summarized using the EggNOG-Mapper online tool and are presented in
Table 2.
The genome sequencing data is available in GenBank at NCBI under the accession number JBAKHK000000000.
3.2. Comparative Analysis of the Complete Genome of Enterobacter sp. UNJFSC 003
The comparative analysis of the complete genome based on Average Nucleotide Identity (ANI) of 16 different Enterobacter strains showed that Enterobacter sp. UNJFSC003 was closely related to nine Enterobacter strains with ANI values ranging from 87.8 to 87.3 as the upper limit, and the other 5 strains had ANI values ranging from below 86.5 to 80.9 (
Figure 2 and
Table 3). Our strain UNJFSC003 and the Crenshaw strain of E. asburiae with GenBank accession GCA_016027695 had a maximum ANI value of 87.8%. This ANI value indicates a relationship between our strain and this bacterium with an ANI value below 95%, considering that the threshold for classifying strains is (≥) 95% ANI [
46]. Based on this, we can suggest that our strain is a new native Enterobacter isolated from soil samples from Lomas de Lachay [
47,
48].
3.3. Analysis of the Pangenome of Enterobacter sp. UNJFSC 003
The pangenome model developed with the participation of 16 Enterobacter strains, including 12 Enterobacter strains, 3 strains of other Enterobacteriaceae species, and our strain, indicated a close genetic relationship between UNJFSC003 and FDAARGOS-1 (E. cancerogenus). Both strains grouped into a single node, indicating their similarity in identical gene clusters. Ten other Enterobacter strains with less variation in genes were observed in the matrix and separated into another node. The three Enterobacteriaceae strains, including Leclercia adecarboxylata strain ATCC2316, Klebsiella spallanzanii strain SB6411, and Citrobacter freundii ATCC8090, exhibited greater variation, along with Enterobacter soli strain ATCC-BAA-2. The pangenome of the 16 Enterobacter strains consists of 381 core genes, 75 soft core genes, 4778 shell genes, and 38044 cloud genes, totaling 43278 genes. Interestingly, Enterobacter sp. UNJFSC003 harbors unique gene clusters presumably present with E. cancerogenus strain code FDAARGOS-1 (
Figure 3).
3.4. The Strain UNJFSC 003 Harbors Genes Encoding Enzymes for the Bioremediation of Hydrocarbons
The HADEG database identified our strain as a potential organism for developing hydrocarbon bioremediation methods, finding two types of hydrocarbons: alkanes (a total of 5) and aromatics (a total of 16). Additionally, two 2 enzymes for biosurfactant production, all determined through orthology analysis, were identified (
Figure 4). The orthology analysis performed by proteinortho with the HADEG database identified proteins involved in the metabolic sub-pathways of the genome. This detailed and specific analysis identified two different types of hydrocarbons (alkanes and aromatics) and biosurfactants, as well as the specific number of enzymes for the bioremediation of aromatic hydrocarbons and alkanes (
Figure 4).
3.5. In Silico Interaction Protein-Protein and Heat-Map of Proteins Involved in Bioremediation
The analysis of protein-protein interactions indicated possible interactions among the various proteins involved in hydrocarbon bioremediation. Indeed, the interaction among enzymes involved in the bioremediation of aromatic hydrocarbons ensures their mutual connection for each different HC, such as hpc genes totaling 6 in the homoprotocatechuate pathways, paa genes totaling 6 in the phenylacetate pathway, and hpa genes in aerobic alkane degradation (
Figure 5a). In the localization of the proteins involved in hydrocarbon bioremediation, it can be observed that 17 proteins predicted by PSORT are located in the cytoplasm and 6 are located in different sites of the bacterial cell (
Figure 5b).
3.6. Molecular Modelling and Docking Analysis
The three-dimensional molecular structures of 6 proteins were constructed using deep modeling and Rosetta. These proteins are related to the bioremediation of aromatic hydrocarbons, specifically the proteins predicted for the bioremediation of aromatic hydrocarbons involved in the subpathways of homoprotocatechuate. According to the validation, all the models can be considered reliable, as the Z-score, Ramachandran analysis, and ERRAT provided satisfactory or good scores.In this context, the best model corresponds to hpcB, based on the results obtained: 95.4% in the Ramachandran plot, a Z-score of -11.36, and an ERRAT score of 97.87. On the other hand, hpcD and hpcE correspond to the worst models, with Ramachandran plot results of 94.6%, a Z-score of 4.79, and an ERRAT score of 83.46.
On the other hand, none of the proteins have shown a signal peptide, indicating that they are all intracellular, with each protein containing between 126 and 488 amino acids. Additionally, all the obtained models displayed a monomeric folded structure. In the six predicted proteins, all possess active sites for the degradation of polycyclic aromatic hydrocarbons (
Table 4).
Table 4.
Evaluation of hpc Protein Models from Enterobacter sp. UNJFSC003: Physicochemical Analysis and Signal Peptide.
Table 4.
Evaluation of hpc Protein Models from Enterobacter sp. UNJFSC003: Physicochemical Analysis and Signal Peptide.
Modelling Server |
Protein Modelling |
Errat |
Error/Warning/Plass |
R. Plot% |
Z-score |
SignalP |
Num. aa |
pI |
Mol Weight |
GRAVY |
trRosseta |
hpcB |
97.87 |
2/4/3 |
95.4% |
-11.36 |
No |
283 |
5.72 |
31721.04 |
-0.228 |
trRosseta |
hpcC |
93.38 |
2/4/3 |
94.0% |
-11.41 |
No |
488 |
6.25 |
53130.86 |
-0.112 |
trRosseta |
hpcD |
83.49 |
1/3/4 |
94.6% |
-4.79 |
No |
126 |
5.82 |
14336.35 |
-0.202 |
trRosseta |
hpcE |
92.44 |
3/2/4 |
90.6% |
-9.65 |
No |
425 |
5.03 |
46227.37 |
-0.146 |
trRosseta |
hpcG |
93.44 |
0/4/4 |
91.8% |
-6.87 |
No |
267 |
5.69 |
29481.68 |
-0.081 |
trRosseta |
hpcH |
96.85 |
0/4/5 |
94.7% |
-8.81 |
No |
265 |
5.73 |
28175.33 |
0.107 |
The hpcB model has a topology of 16 alpha helices, 23 beta sheets, and 33 coils. In the respective molecular docking analysis, this predicted protein obtained an energy value of -10.1 kcal/mol, interacting with the amino acids TRP58, ASN67, LEU121, LEU123, and GLU124; On the other hand, the hpcC model has a topology of 14 alpha helices, 19 beta sheets, and 34 coils. In the respective molecular docking analysis, this predicted protein obtained a value of -10.7 kcal/mol, interacting with the amino acids GLN95, LEU267, PHE268, SER272, and GLN316 And does not form any hydrogen bonds; on the other hand, hpcD has a topology of 3 alpha helices, 6 beta sheets, and 10 coils. In the respective molecular docking analysis, this predicted protein obtained an energy value of -9.7 kcal/mol, interacting with the amino acids GLY81, GLU82, PHE85, and PHE105; in the other analyses, hpcE has a topology of 18 alpha helices, 12 beta sheets, and 32 coils. In its molecular docking analysis, this protein obtained a value of -12.0 kcal/mol, interacting with the amino acids TYR28, ASN29, THR30, TYR103, ARG315, and TYR314; additionally, hpcG has a topology of 11 alpha helices, 10 beta sheets, and 19 coils. In the respective molecular docking analysis, this predicted protein obtained an energy value of -9.1 kcal/mol, interacting with the amino acids TYR128, ASP181, LEU185, HIS210, THR161, and despite having a histidine amino acid, it does not form any hydrogen bonds; Finally, in the group of proteins that degrade homoprotocatechuate, hpcH has a topology of 12 alpha helices, 8 beta sheets, and 21 coils. In the respective molecular docking analysis, this predicted protein obtained an energy value of -9.0 kcal/mol, interacting with the amino acids TRP19, GLY21, LEU212, and VAL234. Similar to hpcG, it also does not form any hydrogen bonds despite the presence of histidine.
4. Discussion
The microorganisms with efficient biodegradation of aromatic hydrocarbons, belonging to Enterobacter sp. and E. xianfagensis [
49,
50], have been reported for their ability to degrade both aromatic hydrocarbons and petroleum. This includes real-time experimental methods and computational molecular docking. Additionally, studies have explored their production of biosurfactants like rhamnolipids, as seen in E. asburiae [
51]. In another study, E. ludwigii strains found as endophytes in plants were discovered capable of degrading hydrocarbons from the rhizosphere [
52]. In this study, the strain of Enterobacter belonging to the species E. ludwigii possesses specific genes for the degradation of both alkanes and aromatic hydrocarbons, as well as genes involved in biosurfactant production. Thus, Enterobacter sp. UNJFSC003 exhibited a variety of enzymatic activities, including dioxygenase activity attributed to the LigB domain in hpcB. This family of enzymes performs the cleavage of aromatic rings as a key function in the degradation of these compounds. They also participate in the metabolism of halogenated aromatic compounds [
53], in addition, aldehyde dehydrogenase in hpcC functions enzymatically by oxidizing a broad spectrum of aliphatic and aromatic aldehydes, utilizing NAD+ and NADP+ as cofactors. These enzymes play pivotal roles in different stages of hydrocarbon degradation [
54], These enzymes also play roles in the degradation of compounds such as pesticides and other toxic substances [
55]. The enzymatic function of CHMI (5-carboxymethyl-2-hydroxymuconate isomerase) in the decomposition of aromatic compounds is studied within the context of hpcD. The structure of this protein was predicted using artificial intelligence through AlphaFold 2, developed by DeepMind [
56], user is studying bioremediation of aromatic compounds in Pseudomonas aeruginosa PAO1. Additionally, they conducted an experimental study cloning the gene in Escherichia coli to produce large quantities of this protein and determine its properties [
57]. Finally, the hydrolase activity in hpcE involves two present domains of hydrolase, whereas hpcG has a single present domain of hydrolase. The latter catalyzes the hydration of a carbon-carbon double bond without the assistance of any cofactor. The crystal structure of hpcG features the FAH fold, which is quite common among enzymes involved in the degradation of aromatic compounds [
58], Lastly, hpcH also obtained a functional domain in the degradation of aromatic compounds according to the analyses conducted by InterProScan [
59]. These studies were conducted using the proteome obtained from annotation developed by Prokka, orthology methods employing the Proteinortho tool, the HADEG database, and active sites identified by InterProScan.
In the present study, our bacteria isolated from soil samples from Lomas de Lachay was observed to form star-shaped colonies and was classified as Gram-negative by Gram staining. Its genome was sequenced using the Illumina NovaSeq system (Macrogen, Rep. of Korea). Subsequent taxonomic identity analysis characterized this bacterium as belonging to the genus E. asburiae with an ANI value of 87.85%. However, taxonomic classification of a microbial strain in the genomic era generally requires sharing an average nucleotide identity (ANI) value of >95% across multiple aligned genes [
60]. This suggests that our isolated strain does not belong to the genus E. asburiae but may instead represent a new Enterobacter species. It maintains a close nucleotide content relationship with E. asburiae and possesses unique gene clusters similar to those found in E. cancerogenus, as observed in the pangenome analysis results with 16 Enterobacter strains. In another study, the genome of strain WCHECI-C4 has an ANI of only 93.34%, leading to its classification as a new species within the genus Enterobacter, named E. chengduensis sp. nov. [
61]. The draft genome of strain UNJFSC003 has a size of 4,798,267 base pairs, 4,460 protein-coding genes, and a GC content of 54.38%. This GC content aligns with findings from the work of [
62]. Enterobacter sp. UNJFSC003 showed a genome completeness of 98.1%, contamination of 0.9%, and quality of 93.6%. In another study, Enterobacter sp. strain RIT-637 with antibacterial activities showed a genome completeness of 99.96% and contamination of 2.08% [
63]. A total of 2671 genes associated with KEGG pathways were obtained. This represents a significantly larger number of genes compared to the study by Indugo [
64], who found a total of 1567 genes in KEGG pathways. The classification of 2540 orthologous genes, 77 active enzymes in carbohydrates, 1064 metabolic networks in total, and 4329 clusters of orthologous genes. Interestingly, studies by Szczerba [
65] found a new strain of the genus Enterobacter isolated from the cow rumen, E. aerogenes LU2, with protein-coding genes involved in carbohydrate metabolism pathways, amino acid transport, carbon source transport, and nitrogen source transport. Furthermore, the genome of Enterobacter sp. UNJFSC003 showed capabilities for hydrocarbon bioremediation and production of low molecular weight (LMW) biosurfactants similar to emulsan. Microbial biosurfactants are produced as secondary metabolites or through enzymatic processes, well-known for their ability to reduce surface tension [
66]. Due to the analyses conducted, we found biotechnological potential in this genus of bacteria. In the study by Peng [
67], they isolated Enterobacter sp. T2 from contaminated sludge capable of rapidly degrading tetrabromobisphenol (TBBPA) via a key protein, haloacid dehalogenase, responsible for TBBPA degradation. In another study by El-belgati [
68], they observed highly significant positive correlations between genetic expressions and increased concentrations of heavy metals, indicating that gene expression was induced by higher concentrations of heavy metals. These facts indicate the great potential that Enterobacter strains have for degrading hydrocarbons, heavy metals, among other compounds, thanks to studies using next-generation sequencing technologies.
Through the analysis of hydrocarbon biodegradation, the results suggested the efficiency of our strain UNJFSC003, identifying enzymes related to hydrocarbon bioremediation, including alkanes and aromatics. Such result is associated with the hydrocarbon degradation effect employed by 3 strains of Pseudomonadota bacteria isolated from a century-old abandoned oil exploration well, analyzing their proteome and using the same gene and enzyme database for aerobic hydrocarbon degradation, finding similar types of hydrocarbons present in our study, among other types of hydrocarbons [
69]. Here, the in silico analysis of the putative protein interaction networks involved in hydrocarbon bioremediation, using the STRING database, revealed potential links between the homoprotocatechuate degradation pathway, specifically hpcB, and the hydroxyphenylacetate degradation system, hpaB and hpaC. An additional connection was found between the catechol degradation pathway, specifically pcaF, and all predicted proteins in the phenylacetate degradation pathway. Additionally, the two biosurfactant-producing proteins interacted with each other without revealing links to other proteins, maintaining their phylogenetic coexistence. Interactions of proteins involved in bioremediation have already been reported in other bacteria isolated from wastewater, such as Bacillus cereus AA-18 [
70]. Additionally, another study on cadmium bioremediation by Khan [
71] shows a similar interaction network to the hpc cluster in our study, involving binding proteins such as YfiY, yfmF, and yfmE. In the subcellular prediction made by the online tool PSORTb v3.0, 17 proteins were located in the cytoplasm, while the rest were found in various cellular regions. There are many methods for predicting the subcellular localization of microbial proteins, ranging from genome-based predictions using PSORTB to metagenome-based predictions with PSORTm. These methods also include the identification of new markers in aquatic and terrestrial microorganisms [
72,
73].
The molecular models of key enzymes (
Table 4 and
Figure 6) involved in the bioremediation of aromatic hydrocarbons showed highly conserved amino acids in the active sites. These results, along with the Ramachandran plot percentage, support the reliability of the modeled structures. Furthermore, all models did not form hydrogen bonds but showed high affinity energy with the benzopyrene molecule at the active sites of all modeled proteins through deep modeling, with affinity energy values ranging from -9.0 kcal/mol to -12.0 kcal/mol. This provides information about the structure and function of the hpc enzyme cluster. Therefore, the 3D models of hpc cluster enzymes strongly support their function in the bioremediation of aromatic hydrocarbons, as annotated from the genes of Enterobacter sp. UNJFSC003.
Combining genomic analysis and molecular modeling, this research has contributed to a better understanding of the great potential of strain UNJFSC 003 isolated from soil samples from Lomas de Lachay. Genomic analysis provides information about the genes involved in the bioremediation of aromatic hydrocarbons, alkanes, and biosurfactant production. Lou [
74] revealed information about the genome of various genetic functionalities, specifically the metabolism of phenanthrene and pyrene degradation in 2 new bacterial strains, Klebsiella michiganensis EF4 and K. oxytoca ETN19, isolated from a soil sample contaminated with PAHs. In the molecular docking analysis, benzopyrene was used as the ligand. Benzopyrene is listed among the priority pollutant polycyclic aromatic hydrocarbons by the United States Environmental Protection Agency [
75]. Benzopyrene has been detected in industrial waste, diesel exhaust gases, charcoal-based foods, and elevated levels of cigarette smoke, and it is also known as a recalcitrant molecule [
76]. Gram-negative bacteria with the ability to degrade polycyclic aromatic compounds have already been reported, such as in the study by Wang & Tam [
77], where the bacterium Cycloclasticus was confirmed to be a ubiquitous degrader of marine PAHs, even in subterranean marine environments. In fact, a limited number of previous studies have reported PAH degradation capabilities in the phylum Proteobacteria, including pathogenic bacteria.
Author Contributions
Conceptualization, G.C., S.C-L., C.A. and P.R-G.; Data curation, G.C., S.C-L., C.A. and P.R-G; Formal analysis, G.C., C.A. and P.R-G.; Funding acquisition, S.C-L. and C.A.; Investigation, G.C., S.C-L., C.A. and P.R-G.; Methodology, G.C., S.C-L., C.A. and P.R-G.; Project administration, S.C-L., C.A. and P.R-G.; Resources, S.C-L., C.A. and P.R-G.; Software, G.C. and P.R-G.; Supervision, S.C-L, C.A. and P.R-G; Validation, G.C. and P.R-G.; Visualization, G.C., S.C-L., C.A. and P.R-G.; Writing – original draft, G.C. and P.R-G.; Writing – review & editing, S.C-L., C.A. and P.R-G. All authors have read and agreed to the published version of the manuscript.