1. Introduction
Skeletal muscle (SkM), the largest tissue in human body, has extraordinary regenerative ability and morphological plasticity in response to internal changes and external challenges [
1,
2]. SkM muscle progenitor cells play a central role in SkM formation and repair. Postnatally, there is a special mechanism of muscle repair involving the activation of muscle satellite cells, SkM stem cells that are usually dormant and lodged under the sarcolemma (the specialized outer membrane of SkM fibers) [
3]. In response to SkM damage, satellite cells become activated to form myoblasts (SkM progenitor cells), which then differentiate and fuse with damaged myofibers, whose muscle cells (myocytes) contain hundreds or even thousands of nuclei.
The complexity of SkM formation and repair and the sensitivity of SkM to physiological and clinical changes makes the study of its epigenetics, including differentially methylated DNA regions (DMRs), of special interest. Increases in muscle mass,
e.g., due to exercise, rely on hypertrophy (because myocytes cannot divide) to enlarge the fibers radially and longitudinally [
4]. Hypertrophy depends on both transcriptional changes in myofiber nuclei and activation of satellite cells [
3,
5]. Muscle injury can affect DNA methylation profiles of satellite cells [
6]. During activation of satellite cells, there are further changes in DNA methylation and in chromatin [
6,
7]. There is overrepresentation of age-related SkM DMRs in SkM enhancer chromatin and regions around transcription start sites (TSS).
Epigenetics is also implicated in satellite cell, myoblast, and muscle fiber heterogeneity [
8,
9,
10] and in memory effects for strenuous muscle use and possibly for muscle disuse [
5,
6,
11,
12]. There are differences in expression profiles and epigenetics for SkM muscle fiber subtypes (
e.g., fast or slow), which can interconvert [
10]. Muscle memory may be partly due to DNA methylation changes associated with SkM conditioning involving a bout of certain types of exercise followed by an interval of inactivity and a subsequent reinitiation of intense activity [
5].
During a recent analysis of genes associated with myoblast DMRs, we found that
ZNF556 (Zinc Finger Protein 556) has a strong preference for expression in both myoblasts and cerebellum, as described below. In a previous study of myoblast epigenetics using reduced representation bisulfite sequencing (RRBS) profiles, we identified
CDH15 (Cadherin 15), as a gene also displaying myoblast and cerebellum-specific expression [
13]. Cerebellum is critical for motor coordination, cognition, and emotional processes [
14]. Cerebellum is quite distinct morphologically and functionally from other brain regions and important in neuromuscular disease [
15]. It has a transcriptome that is strikingly distinct from those of all the other brain regions [
16].
In this study, we systematically explored myoblast/cerebellum transcription associations. We first identified 20 human genes that are preferentially expressed in myoblasts and cerebellum (Myob/Cbl genes). We then investigated transcription and epigenetics relationships for Myob/Cbl genes using our newly available whole-genome bisulfite sequencing (WGBS) or enzymatic methyl-seq (EM-seq) myoblast methylomes [
17] and a recent WGBS profile for cerebellum neurons from Loyfer
et al. [
18] as well as chromatin epigenomics databases [
19]. Unlike RRBS, which covers only up to 5% of the CpGs [
20], WGBS and EM-seq allow the quantitation of methylation at essentially all the CpGs in the genome. These transcriptomic/epigenomic analyses using data from diverse tissues and cell cultures for comparison to myoblasts and cerebellum elucidate differences and similarities in regulation of genes that are preferentially expressed in myoblasts and cerebellum, two very different kinds of cell populations.
3. Discussion
From a genome-wide examination of transcriptomics and epigenomics, we identified 20 genes (Myob/Cbl genes) that have a strong preference for transcription in myoblasts (mesodermally-derived SkM progenitor cells) and cerebellum, a highly dissimilar cell population (ectodermally derived). The co-expression of genes in myoblasts and cerebellum, rather than in other brain regions (
Table S3), reflects the much higher number of genes preferentially expressed in cerebellum than elsewhere in the brain ([
16,
44] and
Table S2). In contrast, myoblasts were not unusual compared to several other progenitor cell strains in having a small subset of genes preferentially expressed in both that cell culture (HUVEC, NHEK, or NHLF) and cerebellum (
Table S5). However, only Myob/Cbl genes had a significant ontological association with TFs (
Table S5,
ZIC1, EN2, PAX3, VAX2,
LBX1, and
ZNF556), which suggest a special transcriptional relationship for myoblasts and cerebellum, although for only a very small percentage of the genome. Among the 20 Myob/Cbl genes, the strongest association of myoblast DNA hypomethylation with gene expression was seen for
ZNF556. Very little is known about its function, but the encoded protein has the structural hallmarks of KRAB zinc finger TFs, which often act as repressor proteins [
34,
45]. The
ZNF556 promoter overlaid a Myob-hypom DMR, which was highly methylated in all 17 examined non-expressing ENCODE cell populations and hypomethylated in promoter chromatin specifically in all
ZNF556-expressing cell populations (
Figure 2 and
Figure 3). Its hypomethylation and expression in HepG2, a hepatocarcinoma cell line, provides an example of cancer DNA hypomethylation coupled with expression of a gene normally active in a small number of very different cell types.
In predicting the relationship of promoter DNA methylation to gene expression, the CpG content of the promoter region is critical [
46]. The
ZNF556 Myob-hypom DMR (TSS -0.7 to +1.3 kb) almost fits the UCSC Genome Browser’s definition of a CGI except that its observed CpG density/expected CpG density was 0.58 instead of >0.6. This promoter DMR would be classified as an intermediate CpG promoter by the definition of Weber
et al. [
46]. CGI promoters are predominantly constitutively unmethylated or very lowly methylated. Normal or disease-acquired methylation of CGI promoters and intermediate-to-high CpG promoters is strongly associated with transcription repression in human cells [
46,
47,
48]. Therefore, it is likely that methylation of the
ZNF556 DMR
in vivo helps silence it. To test this, we determined the effect of DNA methylation on
ZNF556 promoter activity in reporter gene assays.
In vitro methylation of cloned
ZNF556 DMR sequences reduced promoter activity by about half (
Figure 3). Downregulation
in vivo by DNA methylation might be much stronger because we tested the effect of DNA methylation on constructs that did not include the further downstream, CpG-enriched parts of the DMR (
Figure 3). We conclude that the
ZNF556 DNA hypomethylation
in vivo coupled with promoter chromatin at the promoter region enables tissue/cell-specific transcription of this gene.
Unexpectedly, the region immediately upstream of the
ZNF556 TSS (TSS -0.6 to +0.1 kb) did not suffice for appreciable promoter activity in the reporter gene assays even though it overlapped the only prominent DNase-seq peak at the promoter region in
ZNF556-expressing samples. This expression-linked open-chromatin region likely cooperates with adjacent upstream or downstream sequences, which are hypomethylated in myoblasts and cerebellum (
Figure 3). The unusually high concentration of DNA repeats in the TSS-upstream region, especially the LTRs (Long Terminal Repeats,
Figure 3A, bottom), might influence the activity and/or the methylation status of the promoter [
49].
The hypomethylation of the
ZNF556 promoter region in germline cells can help explain the finding that high methylation of its promoter is the default state in non-expressing cells. The
ZNF556 promoter region was hypomethylated in ovarian stromal cells, oocytes, spermatogonia, early spermatids as well as zygotes, 2-cell embryos, and 4-cell embryos (but not morula), and all of these cell populations contain
ZNF556 RNA (
Figure 4; [
28,
29]). The high level of methylation of the
ZNF556 CGI promoter in most somatic cell types is an exception to the strong correlation of CpG-rich promoters having little or no DNA methylation in normal somatic cell types [
50]. The best characterized exception to this general finding is for the set of genes specifically expressed in the germline [
46]. Some CGI promoters are highly methylated in somatic tissues and hypomethylated in sperm [
51] as a result of large losses in DNA methylation from many parts of the genome in mammalian primordial germ cells, which ultimately give rise to both sperm and oocytes [
52]. For some of the above-mentioned cell types,
e.g., zygotes,
ZNF556 promoter hypomethylation might be a memory of former transcriptional activity and possibly a poised state for future activation. This relationship of
ZNF556 expression to gametogenesis and pre-implantation embryos suggests a role for
ZNF556 in pre-implantation embryos that might be shared by its neighbor
ZNF555, as evidenced by the even higher enrichment of
ZNF555 RNA in zygote, 2-cell, and 4-cell embryos than was seen for
ZNF556 (
Figure 4). Therefore, we propose that during evolution, first,
ZNF556 was expressed just in the germline and possibly the preimplantation embryo and only later was there repurposing of the associated epigenetics in myoblasts and cerebellum to extend the tissue/cell specificity of this TF and its functionality.
Two other Myob/Cbl genes,
CDH15 and
TRIM72, had DNA hypomethylation in the promoter region that is likely to help contribute to the high transcription of these genes in myoblasts and cerebellum (
Table 1). Previously, we demonstrated that a region of RRBS-detected myoblast hypermethylation at an intragenic CGI in
CDH15 is a methylation-inhibited cryptic promoter [
13]. However, because of the limited coverage by RRBS, that study did not find the long region of low DNA methylation that stretches from upstream of the
CDH15 TSS to far downstream into intron 1 in myoblasts and cerebellum, as seen in WGBS profiles (
Figure 1). The myoblast- and cerebellum-hypomethylated DNA regions overlapping the
CDH15 and
TRIM72 promoters are transcription-associated extensions of adjacent, constitutively unmethylated regions that overlap CGIs (
Figure 1 and
Figure 5).
TRIM72 presents the unusual example of a gene with a Myob-hypom DMR at its promoter and a Myob-hyperm DMR overlapping the promoter of its intronic, protein-coding gene,
PYDC1 (
Figure 5)
. PYDC1 resides in intron 1 of
TRIM72 and is positioned antisense to it. Unlike its
TRIM72 host gene,
PYDC1 was silenced in myoblasts and SkM but selectively expressed in cerebellum (TPM ratio 4.7 for cerebellum
vs. the average of 10 other brain regions). RNA levels for
TRIM72 are almost 4-fold lower in cerebellum than in SkM, which might be due, in part, to dampening of
TRIM72 transcription in cerebellum due to the clashing movement of RNA polymerase complexes from the
PYDC1 TSS. This dichotomous expression of two overlapping genes in cerebellum and myoblasts is probably achieved, in part, by the Myob/SkM-hyperm DMR that extended from the
PYDC1 promoter region. In contrast, cerebellum lacked DNA methylation at and around
PYDC1 as did most tissues with a silent
PYDC1 gene. This indicates that the
PYDC1 CGI promoter is constitutively unmethylated regardless of expression status with the exceptions of myoblasts and SkM (
Figure 5). This SkM-lineage hypermethylation overlaps enhancer chromatin or mixed enhancer/promoter chromatin. The hypermethylation probably helps establish or maintain an intragenic
TRIM72 enhancer in the SkM lineage at a region that is an active, unmethylated
PYDC1 promoter in cerebellum and repressed chromatin in most other cell populations.
Among the Myob/Cbl genes, Myob-hyperm DMRs were more frequent than Myob-hypom DMRs (
Table 1). Some of this differential methylation may help direct alternative promoter usage for
ANK1, CNPY1, DOK7, and
MCF2L (
Figure 6,
Figure S7, 8, and
Figure S10). The alternate promoter usage for these genes not only changes the polypeptide that the gene encodes, but also, in the case of
ANK1, might affect the efficiency of production of a co-transcribed intronic miRNA miR-486 (
Figure 6). This miRNA plays pivotal roles in myogenesis and is important for normal heart formation in mice [
37]. Moreover, it is being considered for therapeutic disease modulation because its upregulation in dystrophic mouse models can reduce symptoms of muscular dystrophy [
37]. In addition, in some cancers, miR-486 is an oncogenic marker and may play a role in oncogenesis [
53,
54]. DNA hypermethylation at multiple
ANK1 CGI promoter regions is associated with cancer-linked changes in miRNA levels [
53]. However, the epigenetics of the myoblast/SkM/heart promoter in cancers has received little attention probably because it does not overlap a CGI. There might be cancer-associated hypomethylation of this
ANK1 promoter, which could influence miR-486 levels as well as
ANK1 isoform abundance ratios.
Some Myob/Cbl genes exhibit only moderately low steady-state levels of RNA in myoblasts but much higher levels in cerebellum (
Table 1). Five of these genes (
KCNJ12, ST8SIA5, ZIC1, EN2, and
VAX2) displayed Myob-hyperm DMRs upstream and/or downstream of their core promoter region that was missing in cerebellar neurons (
Table 1,
Figures S2, S4-S6) and was probably downmodulating expression of these gene in myoblasts vs. cerebellum. In a previous study of
TBX15, a TF-encoding gene preferentially expressed in myoblasts and SkM but not in any brain region, we used reporter gene assays and
in vitro methylation to demonstrate that enhancer and promoter activity of Myob-hyperm DMR sequences upstream or downstream of its TSS was strongly suppressed by DNA methylation [
17]. Myoblast and cerebellum hypermethylation upstream and downstream of the
PAX3 promoter in both myoblasts and cerebellum is probably helping to keep transcription low in both cell populations (
Table 1,
Figure 7).
In addition to regulating usage of alternative promoters, directing usage of intronic promoters, and silencing cryptic promoters, intragenic hypermethylation in transcribed genes can facilitate movement of the RNA polymerase complex across the gene body of actively transcribed genes to regulate alternate splicing [
55,
56]. It has also been proposed that such intragenic methylation associated with transcription may be simply a consequence of the recruitment of DNMT3 enzymes’ PWWP domain by H3K36me3. Twelve of the Myob/Cbl genes had intragenic Myob-hyperm DMRs but only three of them (
CDH15, ANK1, and
MCF2L) had DMRs that overlapped H3K36me3-enriched chromatin in myoblasts (Txn-chrom,
Figure 1 and
Figure 6, and
Figure S9). Another mechanism for intragenic or intergenic DNA hypermethylation being positively associated with gene expression is that DNA methylation may decrease the spread of repressive H3K27me3-enriched chromatin in many chromatin/DNA contexts [
57,
58]. Although epigenetic profiles suggested that this is not the function of most of the examined DNA hypermethylation at Myob/Cbl genes, it might be the case for the
VAX2 TSS-downstream DNA hypermethylation in myoblasts and cerebellum (
Figure S5).
A caveat in our study is that DNA methylation levels at
cis-regulatory elements can vary with the exact nature of the cells or tissues studied (physiology, cell composition, age, and health status of the donor, and for SkM, the muscle location, and fiber type [
8,
59,
60,
61]. However, these changes are usually less than the strong differences in DNA methylation that are tissue-specific. Another caveat is that WGBS does not distinguish between genomic 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC), which often have different biological effects [
62,
63]. This is not a complication for Myob-hyperm DMRs because we found that myoblast cell strains have little or no 5hmC at tested CpG sites in 13 examined myoblast RRBS-delineated DMRs, including one in the
CDH15 intragenic Myob-hyperm DMR at its cryptic intragenic promoter [
13]. This is consistent with the loss of genomic 5hmC reported upon passage of mouse embryonic fibroblast cell strains [
64]. In the case of tissues, especially cerebellum, how much of the WGBS signal for DNA methylation is actually 5hmC is an important consideration [
13,
63,
65]. For example, we previously found that SkM and cerebellum at a tested CpG in the
CDH15 Myob/SkM-hyperm DMR has about twice as much 5hmC as 5mC [
13].
An illustration of the need to consider the 5hmC content of hypermethylated regions in cerebellum comes from the important findings of James
et al. [
59,
66]. Their studies suggest a role for the homeobox TF EN2 (encoded by a Myob/Cbl gene) in autism spectrum disorder in addition to its pivotal contributions to cerebellum development [
67]. They found that a 0.15-kb region ~ 3 kb upstream of the
EN2 TSS had more 5hmC as well as 5mC in patient samples than in matched controls. These epigenetic changes were positively associated with more
EN2 RNA and protein. We observed that this region is partially methylated in control cerebellum in contrast to little or no methylation in other studied tissues and cell types (
Figure S7E and G). Moreover, Szulwach
et al. [
68] found that the ~ 4 – 8 kb region upstream of the mouse
En2 gene in cerebellum contains peaks of 5hmC overlapping a previously identified embryonic enhancer for the
En2 gene. Several of the human cerebellum-hypermethylated regions that we observed upstream of the
EN2 promoter are adjacent to DNase-seq peaks. They might help demarcate enhancers or, if they contain sufficient 5hmC, counteract binding of the repressive MECP2 protein [
59,
67,
69].
For some of the Myob/Cbl genes there are apparent functional relationships between myoblasts and cerebellum. Nine of these genes (
ANK1, CDH15, DOK7, FNDC5, MCF2L, TRIM72, CHRD, KCNJ12, and
PTPRR) encode proteins localized mostly or in part to the plasma membrane, and the first six of these were expressed at moderate to high levels in both myoblasts and cerebellum (
Table 1 and
Table S5). Regulation of cell-cell interactions is critical for controlling neuronal function [
70] as well as for regulating myoblast fusion [
71,
72]. An example of a Myob/Cbl gene with known myoblast and brain functions for one of these plasma membrane-associated Myob/Cbl is the above-mentioned
CDH15. Its encoded protein is a cadherin implicated in fusion of myoblasts to form multinucleated myotubes
via its role in cell-cell interactions [
23,
71] and in intellectual function from studies of mutationally linked intellectual disability syndromes for which the mutations alter cell-cell contacts [
73,
74].
Another functional relationship shared by multiple Myob/Cbl genes is that six of them (
ZNF556, EN2, ZIC1, PAX3, LBX1, and
VAX2) encode TFs, four of which are homeobox-containing TFs. Both “transcription factor activity” and “homeobox” categories were significantly overrepresented among the 20 Myob/Cbl genes (
Table S5). Precise modulation of levels of expression at different times in development or in response to physiological changes is especially important for such proteins [
67,
75] and often requires changes in epigenetics. Some of the regulation of promoters and enhancers precisely modulating expression of pivotal Myob/Cbl genes is likely to involve binding of TFs that are specific for either the SkM or the neural lineages (
Figure S11). In contrast, TFs encoded by Myob/Cbl genes have dual myoblast and cerebellum specificities. Two Myob/Cbl genes,
PAX3 and
LBX1, code for TFs that are involved in both skeletal muscle development and neuronal differentiation (
Table S5; [
72,
76,
77,
78]).
PAX3 is implicated in the regulation of transcription of three other Myob/Cbl genes,
EN2,
ANK1, and
LBX1 [
79,
80]. LBX1, EN2, and VAX2 TFs were predicted to bind to Myob-hypom DMRs at
ZNF556 and
ANK1 promoters. We propose that epigenetic regulation of expression of TF-encoding Myob/Cbl genes in these two dissimilar cell populations not only helps regulate their expression, but also indirectly regulates the tissue/cell-specificity of other Myob/Cbl genes.